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Foreword 


As its name implies, the theory of fuzzy sets is, basically, a theory of graded con- 
cepts—a theory in which everything is a matter of degree or, to put it figuratively, 
everything has elasticity. 

In the two decades since its inception, the theory has matured into a wide- 
ranging collection of concepts and techniques for dealing with complex phe- 
nomena that do not lend themselves to analysis by classical methods based on 
probability theory and bivalent logic. Nevertheless, a question that is frequently 
raised by the skeptics is: Are there, in fact, any significant problem-areas in which 
the use of the theory of fuzzy sets leads to results that could not be obtained by 
classical methods? 

Professor Zimmermann’s treatise provides an affirmative answer to this ques- 
tion. His comprehensive exposition of both the theory and its applications 
explains in clear terms the basic concepts that underlie the theory and how they 
relate to their classical counterparts. He shows through a wealth of examples the 
ways in which the theory can be applied to the solution of realistic problems, par- 
ticularly in the realm of decision analysis, and motivates the theory by applica- 
tions in which fuzzy sets play an essential role. 

An important issue in the theory of fuzzy sets that does not have a counterpart 
in the theory of crisp sets relates to the combination of fuzzy sets through disjunc- 
tion and conjunction or, equivalently, union and intersection. Professor Zimmer- 
mann and his associates at the Technical University of Aachen have made many 
important contributions to this problem and were the first to introduce the concept 
of a parametric family of connectives that can be chosen to fit a particular applica- 
tion. In recent years, this issue has given rise to an extensive literature dealing with 
t-norms and related concepts that link some aspects of the theory of fuzzy sets to 
the theory of probabilistic metric spaces developed by Karl Menger. 
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Another important issue addressed in Professor Zimmermann ’s treatise relates 
to the distinction between the concepts of probability and possibility, with the 
latter concept having a close connection with that of membership in a fuzzy set. 
The concept of possibility plays a particularly important role in the representa- 
tion of meaning, in the management of uncertainty in expert systems, and in appli- 
cations of the theory of fuzzy sets to decision analysis. 

As one of the leading contributors to and practitioners of the use of fuzzy sets 
in decision analysis, Professor Zimmermann is uniquely qualified to address 
the complex issues arising in fuzzy optimization problems and, especially, 
fuzzy mathematical programming and multicriterion decision making in a fuzzy 
environment. His treatment of these topics is comprehensive, up-to-date, and 
illuminating. 

In sum, Professor Zimmermann’s treatise is a major contribution to the liter- 
ature of fuzzy sets and decision analysis. It presents many original results and 
incisive analyses. And, most importantly, it succeeds in providing an excellent 
introduction to the theory of fuzzy sets—an introduction that makes it possible 
for an uninitiated reader to obtain a clear view of the theory and learn about its 
applications in a wide variety of fields. 

The writing of this book was a difficult undertaking. Professor Zimmermann 
deserves to be congratulated on his outstanding accomplishment and thanked for 
contributing so much over the past decade to the advancement of the theory of 
fuzzy sets as a scientist, educator, administrator, and organizer. 


L.A. Zadeh 


Preface 


Since its inception 20 years ago, the theory of fuzzy sets has advanced in a variety 
of ways and in many disciplines. Applications of this theory can be found, for 
example, in artificial intelligence, computer science, control engineering, deci- 
sion theory, expert systems, logic, management science, operations research, 
pattern recognition, and robotics. Theoretical advances have been made in many 
directions. In fact it is extremely difficult for a newcomer to the field or for some- 
body who wants to apply fuzzy set theory to his problems to recognize properly 
the present “state of the art.” Therefore, many applications use fuzzy set theory 
on a much more elementary level than appropriate and necessary. On the other 
hand, theoretical publications are already so specialized and assume such a back- 
ground in fuzzy set theory that they are hard to understand. The more than 4,000 
publications that exist in the field are widely scattered over many areas and in 
many journals. Existing books are edited volumes containing specialized contri- 
butions or monographs that focus only on specific areas of fuzzy sets, such as 
pattern recognition [Bezdek 1981], switching functions [Kandel and Lee 1979], 
or decision making [Kickert 1978]. Even the excellent survey book by Dubois 
and Prade [1980a] is primarily intended as a research compendium for insiders 
rather than an introduction to fuzzy set theory or a textbook. This lack of a com- 
prehensive and modern text is particularly recognized by newcomers to the field 
and by those who want to teach fuzzy set theory and its applications. 

The primary goal of this book is to help to close this gap—to provide a 
textbook for courses in fuzzy set theory and a book that can be used as an 
introduction. 

One of the areas in which fuzzy sets have been applied most extensively is in 
modeling for managerial decision making. Therefore, this area has been selected 
for more detailed consideration. The information has been divided into two 
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volumes. The first volume contains the basic theory of fuzzy sets and some areas 
of application. It is intended to provide extensive coverage of the theoretical and 
applicational approaches to fuzzy sets. Sophisticated formalisms have not been 
included. I have tried to present the basic theory and its extensions in enough 
detail to be comprehended by those who have not been exposed to fuzzy set 
theory. Examples and exercises serve to illustrate the concepts even more clearly. 
For the interested or more advanced reader, numerous references to recent liter- 
ature are included that should facilitate studies of specific areas in more detail 
and on a more advanced level. 

The second volume is dedicated to the application of fuzzy set theory to the 
area of human decision making. It is self-contained in the sense that all concepts 
used are properly introduced and defined. Obviously this cannot be done in the 
same breadth as in the first volume. Also the coverage of fuzzy concepts in the 
second volume is restricted to those that are directly used in the models of deci- 
sion making. 

It is advantageous but not absolutely necessary to go through the first volume 
before studying the second. The material in both volumes has served as texts in 
teaching classes in fuzzy set theory and decision making in the United States and 
in Germany. Each time the material was used, refinements were made, but the 
author welcomes suggestions for further improvements. 

The target groups were students in business administration, management 
science, operations research, engineering, and computer science. Even though no 
specific mathematical background is necessary to understand the books, it is 
assumed that the students have some background in calculus, set theory, opera- 
tions research, and decision theory. 

I would like to acknowledge the help and encouragement of all the students, 
particularly those at the Naval Postgraduate School in Monterey and at the Insti- 
tute of Technology in Aachen (F.R.G.), who improved the manuscripts before 
they became textbooks. I also thank Mr. Hintz, who helped to modify the differ- 
ent versions of the book, worked out the examples, and helped to make the text 
as understandable as possible. Ms. Grefen typed the manuscript several times 
without losing her patience. I am also indebted to Kluwer Academic Publishers 
for making the publication of this book possible. 


H.-J. Zimmermann 


Preface for the Revised Edition 


Since this book was first published in 1985, Fuzzy Set Theory has had an unex- 
pected growth. It was further developed theoretically and it was applied to new 
areas. A number of very good books have appeared, primarily dedicated to special 
areas such as Possibility Theory [Dubois and Prade 1988a], Fuzzy Control 
[Sugeno 1985a; Pedrycz 1989], Behavioral and Social Sciences [Smithson 1987], 
and others have been published. Many new edited volumes, either dedicated to 
special areas or with a much wider scope, have been added to the existing ones. 
Thousands of articles have been published on fuzzy sets in various journals. Suc- 
cessful real applications of fuzzy set theory have also increased in number and 
in quality. In particular, applications of fuzzy control, fuzzy computers, expert 
system shells with capabilities to process fuzzy information, and fuzzy decision 
support systems have become known and have partly already proven their supe- 
riority over more traditional tools. 

One thing, however, does not seem to have changed since 1985: access to the 
area has not become easier for newcomers. I do not know of any introductory yet 
comprehensive book or textbook that will facilitate entering into the area of fuzzy 
sets or that can be used in classwork. 

I am, therefore, very grateful to Kluwer Academic Publishers for having 
agreed to publish a revised edition of the book, which four times has already been 
printed without improvement. In this revised edition all typing and other errors 
have been eliminated. All chapters have been updated. The chapters on possibil- 
ity theory (8), on fuzzy logic and approximate reasoning (9), on expert systems 
and fuzzy control (10), on decision making (12), and on fuzzy set models in oper- 
ations research (13) have been restructured and rewritten. Exercises have been 
added to almost all chapters and a teacher’s manual is available on request. 

The intention of the book, however, has not changed: While the second volume 
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[Zimmermann 1987] focuses on decision making and expert systems and intro- 
duces fuzzy set theory only where and to the extent that it is needed, this book 
tries to offer a didactically prepared text which requires hardly any special math- 
ematical background of the reader. It tries to introduce fuzzy set theory as com- 
prehensively as possible, without delving into very theoretical areas or presenting 
any mathematical proofs which do not contribute to a better understanding. It 
rather offers numerical examples wherever possible. I would like to thank very 
much Mr. C. von Altrock, Ms. B. Lelke, Mr. R. Weber, and Dr. B. Werners for 
their active participation in preparing this revised edition. Mr. Andrée and Mr. 
Lehmann kindly prepared the figures. Ms. Oed typed and retyped manuscripts 
over and over again and helped us to arrive at the final manuscript of the book. 
We are all obliged to Kluwer Academic Publishers for the opportunity to publish 
this volume and for the good cooperation in preparing it. 


H.-J. Zimmermann 


Preface to the Third Edition 


The development of fuzzy set theory to fuzzy technology during the first half of 
the 1990s has been very fast. More than 16,000 publications have appeared since 
1965. Most of them have advanced the theory in many areas. Quite a number of 
these publications describe, however, applications of fuzzy set theory to existing 
methodology or to real problems. In addition, the transition from fuzzy set theory 
to fuzzy technology has been achieved by providing numerous software and hard- 
ware tools that considerably improve the design of fuzzy systems and make them 
more applicable in practice. Since 1994, fuzzy set theory, artificial neural nets, 
and genetic algorithms have also moved closer together and are now normally 
called “computational intelligence.” All these changes have made this technol- 
ogy more powerful but also more complicated and have raised the “entrance 
barrier” even higher. This is particularly regrettable since more and more uni- 
versities and other educational institutions are including fuzzy set theory in their 
programs. In some countries, a large number of introductory books have been 
published; in Germany, for instance, 25 such books were published in 1993 and 
1994. English textbooks, however, are still very much lacking. 

Therefore, I appreciate very much that Kluwer Academic Publishers has 
agreed to publish a third edition of this book, which updates the second revised 
edition. 

New developments, to the extent that they are relevant for a basic textbook, 
have been included. All chapters have been updated. Chapters 9, 10, 11, and 12 
have been completely rewritten. Nevertheless, I have tried not to let the book 
grow beyond a basic textbook. To reconcile the conflict between the nature of a 
textbook and the fast growth of the area, many references have been added to 
facilitate deeper insights for the interested reader. 

I would like to thank Mr. Tore Griinert for his active participation and contri- 
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butions, particularly to chapter 11, and all my coworkers for helping to proofread 
the book and to prepare new figures. We all hope that this third edition will benefit 


future students and accelerate the broader acceptance of fuzzy set theory. 


Aachen, April 1995 


H.-J. Zimmermann 


Preface to the Fourth Edition 


The new Millennium starts with over 30,000 publications in the area of “com- 
putational intelligence” or “soft computing”. These are terms which have been 
coined in the first half of the 90s, when fuzzy set theory, neural networks and 
evolutionary computing joined forces because they felt that there were strong syn- 
ergies between these areas. This is certainly true, in spite of the fact, that evolu- 
tionary computing has its strength in optimization, neural nets are particularly 
strong in pattern recognition and automatic learning, whereas fuzzy set theory 
has its strength in modeling, interfacing humans with computers and modeling 
certain uncertainties. Particularly between fuzzy set theory and neural nets the 
synergies have been used to develop hybrid models and methods, that combine 
the strengths of both of these areas. 

Nevertheless, all three areas are continuing to develop new approaches in 
their own areas. For a textbook it would be inadequate to cover the basics 
of fuzzy set theory and also the vast area of computational intelligence or 
hybrid fuzzy-neuro methods or it would have to do this on a very superficial 
level. Hence, this book is restricted to the theory and application of fuzzy set 
theory only. 

Apart from the convergence of above mentioned three areas two main devel- 
opments can probably be observed for fuzzy sets: 


1. There is a widening gap between the mathematics of fuzzy set theory and 
fuzzy technology, as the more applied version of fuzzy set theory. The still 
very strong activities in the theoretical direction lead to more and more very 
specific mathematical developments, which is natural, legitimate and cer- 
tainly also important. The applicational relevance of these research results, 
however, is often not obvious and only perceivable by very advanced and 
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specialized theoreticians. New developments in fuzzy technology follow 
more the needs of a changed industrial environment. 

2. One of the major changes in the industrial environment since the early 90s is, 
that we have moved from a situation of a lack of (electronically readable) data 
to one of an abundance of such data. Together with the dramatic increase in the 
power of electronic data processing and web-technology this has lead in fuzzy 
technology from a focus in modeling to a concentration in complexity reduc- 
tion, i.e. pattern recognition, data mining and automatic knowledge discovery. 
This situation is mirrored in this edition of the book by an extension of the 
chapter on data mining and a new chapter on fuzzy sets in data bases. 


The following figure indicates the development of fuzzy set theory from another 
point of view: 
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As shown there, the time lag between theory, application, and the development 
of a fuzzy technology (with efficient CASE-tools for the development of fuzzy 
systems), was, roughly speaking, ten years each. 

This was valid until the first “fuzzy booms” occurred in the first half of the 90s. 
Until then the development of applications and technology centered very much 
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around fuzzy control, a concept that was very applicable, easy to understand, and, 
therefore, attractive to many industrial practitioners and the broad public. 

Since the start of computational intelligence theoretical as well as application- 
oriented developments have become much more diversified and clear lead-times 
between theoretical development and application can no longer be recognized. 

I have used the opportunity of a fourth edition of this textbook, for which I 
am very grateful to Kluwer Academic Publishers, to adapt the book to the new 
developments, without exceeding the scope of a basic textbook, as follows: 

All chapters have been up-dated. The scope of part I has only been extended 
with respect to t-norms, other operators and uncertainty modeling because I am 
convinced that chapters 2 to 7 are still sufficient as a mathematical basis to under- 
stand all new developments in this area and also for part II of the book, where 
the major changes and extensions of this edition can be found: 

In chapter 10 the modeling of uncertainty in expert systems was extended 
because this component has gained importance in practice. 

In chapter 11 primarily a section for defuzzification has been added for the 
same reason. 

Chapter 12 has been added because the application of fuzzy technology in 
information processing is already important and will certainly increase in impor- 
tance in the future. 

Chapter 13 has been extended by explaining new methodological develop- 
ments in dynamic fuzzy data analysis, which will also be of growing importance 
in the future. 

Eventually applications in chapter 15 have been completely restructured by 
deleting some, adding others and classifying all of them differently. This was nec- 
essary because the focus of applications here changed, for reasons explained in 
this chapter, strongly from “engineering intelligence” to “business intelligence”. 

Of course, the index and the references have also been updated and extended. 

This time I would like to thank again Kluwer Academic Publishers for giving 
me the chance of a fourth edition and Dr. Angstenberger for her excellent research 
cooperation and for letting me use one application from her book. 

In particular, I would like to thank Ms. Katja Palczynski for her outstanding 
help to get the manuscripts ready for the publisher. 

I hope that this new edition of my textbook will help to keep respective courses 
in universities and elsewhere up-to-date and challenging and motivating for stu- 
dents as well as professors. It may also be useful for practitioners that want to 
up-date their knowledge of fuzzy technology and look for new applications in 
their area. 


Aachen, April 2001 
H.-J. Zimmermann 


INTRODUCTION TO 
FUZZY SETS 


1.1 Crispness, Vagueness, Fuzziness, Uncertainty 


Most of our traditional tools for formal modeling, reasoning, and computing are 
crisp, deterministic, and precise in character. By crisp we mean dichotomous, that 
is, yesS-or-no-type rather than more-or-less type. In conventional dual logic, for 
instance, a statement can be true or false—and nothing in between. In set theory, 
an element can either belong to a set or not; and in optimization, a solution is 
either feasible or not. Precision assumes that the parameters of a model represent 
exactly either our perception of the phenomenon modeled or the features of the 
real system that has been modeled. Generally, precision also implies that the 
model is unequivocal, that is, that it contains no ambiguities. 

Certainty eventually indicates that we assume the structures and parameters 
of the model to be definitely known, and that there are no doubts about their 
values or their occurrence. If the model under consideration is a formal model 
[Zimmermann 1980, p. 127], that is, if it does not pretend to model reality ade- 
quately, then the model assumptions are in a sense arbitrary, that is, the model 
builder can freely decide which model characteristics he chooses. If, however, 
the model or theory asserts factuality [Popper 1959; Zimmermann 1980], that is, 
if conclusions drawn from these models have a bearing on reality and are 
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2 FUZZY SET THEORY—AND ITS APPLICATIONS 


supposed to model reality adequately, then the modeling language has to be suited 
to model the characteristics of the situation under study appropriately. 

The utter importance of the modeling language is recognized by Apostel, when 
he says: 


The relationship between formal languages and domains in which they have models 
must in the empirical sciences necessarily be guided by two considerations that are by 
no means as important in the formal sciences: 

(a) The relationship between the language and the domain must be closer because they 
are in a sense produced through and for each other; 

(b) extensions of formalisms and models must necessarily be considered because 
everything introduced is introduced to make progress in the description of the 
objects studied. Therefore we should say that the formalization of the concept of 
approximate constructive necessary satisfaction is the main task of semantic study 
of models in the empirical sciences. [Apostel 1961, p. 26] 


Because we request that a modeling language be unequivocal and nonredun- 
dant on one hand and, at the same time, catch semantically in its terms all that 
is important and relevant for the model, we seem to have the following problem. 
Human thinking and feeling, in which ideas, pictures, images, and value systems 
are formed, first of all certainly has more concepts or comprehensions than our 
daily language has words. If one considers, in addition, that for a number of 
notions we use several words (synonyms), then it becomes quite obvious that the 
power (in a set-theoretic sense) of our thinking and feeling is much higher than 
the power of a living language. If in turn we compare the power of a living lan- 
guage with the logical language, then we will find that logic is even poorer. There- 
fore it seems to be impossible to guarantee a one-to-one mapping of problems 
and systems in our imagination and in a model using a mathematical or logical 
language. 

One might object that logical symbols can arbitrarily be filled with semantic 
contents and that by doing so the logical language becomes much richer. It will 
be shown that it is very often extremely difficult to appropriately assign seman- 
tic contents to logical symbols. 

The usefulness of the mathematical language for modeling purposes is undis- 
puted. However, there are limits to the usefulness and the possibility of using 
classical mathematical language, based on the dichotomous character of set 
theory, to model particular systems and phenomena in the social sciences: “There 
is no idea or proposition in the field, which can not be put into mathematical lan- 
guage, although the utility of doing so can very well be doubted” [Brand 1961]. 
Schwarz [1962] brings up another argument against the nonreflective use of math- 
ematics when he states: “An argument, which is only convincing if it is precise 
loses all its force if the assumptions on which it is based are slightly changed, 
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while an argument, which is convincing but imprecise may well be stable under 
small perturbations of its underlying axioms.” For factual models or modeling 
languages, two major complications arise: 


1. Real situations are very often not crisp and deterministic, and they cannot be 
described precisely. 

2. The complete description of a real system often would require far more 
detailed data than a human being could ever recognize simultaneously, 
process, and understand. 


This situation has already been recognized by thinkers in the past. In 1923 the 
philosopher B. Russell [1923] referred to the first point when he wrote: 


All traditional logic habitually assumes that precise symbols are being employed. 
It is therefore not applicable to this terrestrial life but only to an imagined 
celestial existence. 


L. Zadeh referred to the second point when he wrote, “As the complexity of 
a system increases, our ability to make precise and yet significant statements 
about its behaviour diminishes until a threshold is reached beyond which preci- 
sion and significance (or relevance) become almost mutually exclusive charac- 
teristics.” [Zadeh 1973a] 

Let us consider characteristic features of real-world systems again: Real 
situations are very often uncertain or vague in a number of ways. Due to lack of 
information, the future state of the system might not be known completely. This 
type of uncertainty (stochastic character) has long been handled appropriately by 
probability theory and statistics. This Kolmogoroff-type probability is essentially 
frequentistic and is based on set-theoretic considerations. Koopman’s probability 
refers to the truth of statements and therefore is based on logic. In both types of 
probabilistic approaches, however, it is assumed that the events (elements of sets) 
or the statements, respectively, are well defined. We shall call this type of uncer- 
tainty or vagueness stochastic uncertainty in contrast to the vagueness con- 
cerning the description of the semantic meaning of the events, phenomena, or 
statements themselves, which we shall call fuzziness. 

Fuzziness can be found in many areas of daily life, such as in engineering [see, 
for instance, Blockley 1980], medicine [see Vila and Delgado 1983], meteorol- 
ogy [Cao and Chen 1983], manufacturing [Mamdani 1981], and others. It is 
particularly frequent, however, in all areas in which human judgment, evaluation, 
and decisions are important. These are the areas of decision making, reasoning, 
learning, and so on. Some reasons for this fuzziness have already been mentioned. 
Others are that most of our daily communication uses “natural languages,” and 


4 FUZZY SET THEORY—AND ITS APPLICATIONS 


a good part of our thinking is done in it. In these natural languages, the meaning 
of words is very often vague. The meaning of a word might even be well defined, 
but when using the word as a label for a set, the boundaries within which objects 
do or do not belong to the set become fuzzy or vague. Examples are words such 
as “birds” (how about penguins, bats, etc.?) or “red roses,” but also terms such 
as “tall men,” “beautiful women,” and “creditworthy customers.” In this context 
we can probably distinguish two kinds of fuzziness with respect to their origins: 
intrinsic fuzziness and informational fuzziness. The former is the fuzziness 
to which Russell’s remark referred, and it is illustrated by “tall men.” This term 
is fuzzy because the meaning of tall 1s fuzzy and dependent on the context (height 
of observer, culture, etc.). An example of the latter is the term “creditworthy 
customers”: A creditworthy customer can possibly be described completely 
and crisply if we use a large number of descriptors. These descriptors are 
more, however, than a human being could handle simultaneously. Therefore 
the term, which in psychology is called a “subjective category,” becomes fuzzy. 
One could imagine that the subjective category “creditworthiness” is decomposed 
into two smaller subjective categories, each of which needs fewer descriptors 
to be completely described. This process of decomposition could be continued 
until the descriptions of the subjective categories generated are reasonably 
defined. On the other hand, the notion “creditworthiness” could be constructed 
by starting with the smallest subjective subcategories and aggregating them 
hierarchically. 

For creditworthiness the concept structure shown in figure 1—1, which has 
a symmetrical structure, was developed in consultation with 50 credit clerks 
of banks. 

Credit experts distinguish between the financial basis and the personality of 
an applicant. “Financial basis” comprises all realities, movables, assets, liquid 
funds, and others. The evaluation of the economic situation depends on the actual 
securities, that is, the difference between property and debts, and on the liquid- 
ity, that is, the continuous difference between income and expenses. 

On the other hand, “personality” denotes the collection of traits by which a 
potent and serious person is distinguished. The achievement potential is based on 
mental and physical capacity as well as on the individual’s motivation. The busi- 
ness conduct includes economical standards. While the former means the setting 
of realistic goals, reasonable planning, and criteria of economic success, the latter 
is directed toward the applicant’s disposition to obey business laws and mutual 
agreements. Hence a credit-worthy person lives in secure circumstances and guar- 
antees a successful, profit-oriented cooperation (see figure 1-1). 

Before turning to fuzzy set theory it should, however, be stressed that uncer- 
tainty is a multi-facetted phenomenon and that the modeling of it in application- 
oriented models requires considerable investigations before we start the modeling 
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Figure 1-1. Concept hierarchy of creditworthiness. 


process. Also the available modeling tools do not only include probability theory 
and fuzzy set theory. We shall consider this fact in more detail in chapter 8. 

In chapter 16 we will return to this figure and elaborate on the type 
of aggregation. 


1.2 Fuzzy Set Theory 


The first publications in fuzzy set theory by Zadeh [1965] and Goguen [1967, 
1969] show the intention of the authors to generalize the classical notion of a set 
and a proposition [statement] to accommodate fuzziness in the sense described 
in section 1.1. 

Zadeh [1965, p. 339] writes, “The notion of a fuzzy set provides a convenient 
point of departure for the construction of a conceptual frame-work which paral- 
lels in many respects the framework used in the case of ordinary sets, but is more 
general than the latter and, potentially, may prove to have a much wider scope 
of applicability, particularly in the fields of pattern classification and information 
processing. Essentially, such a framework provides a natural way of dealing with 
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problems in which the source of imprecision is the absence of sharply defined 
criteria of class membership rather than the presence of random variables.” 

“Imprecision” here is meant in the sense of vagueness rather than the lack of 
knowledge about the value of a parameter (as in tolerance analysis). Fuzzy set 
theory provides a strict mathematical framework (there is nothing fuzzy about 
fuzzy set theory!) in which vague conceptual phenomena can be precisely and 
rigorously studied. It can also be considered as a modeling language well suited 
for situations in which fuzzy relations, criteria, and phenomena exist. 

Fuzziness has so far not been defined uniquely semantically, and probably 
never will be. It will mean different things, depending on the application area and 
the way it is measured. In the meantime, numerous authors have contributed to 
this theory. In 1984, as many as 4,000 publications have already existed and in 
2000 there were already more than 30,000. 

The specialization of those publications conceivably increases, making it more 
and more difficult for newcomers to this area to find a good entry and to under- 
stand and appreciate the philosophy, formalism, and applications potential of this 
theory. Roughly speaking, fuzzy set theory in the last two decades has developed 
along two lines: 


1. Asa formal theory that, when maturing, became more sophisticated and spec- 
ified and was enlarged by original ideas and concepts as well as by “embrac- 
ing” classical mathematical areas such as algebra, graph theory, topology, 
and so on by generalizing (fuzzifying) them. 

2. As an application-oriented “fuzzy technology”, i.e. as a tool for modeling, 
problem solving and data mining that has proven superior to existing methods 
in many cases and as an attractive “add-on” to classical approaches in other 
cases. 


In this context it may be useful to cite and comment the major goals of this tech- 
nology briefly and to correct the still very common view that fuzzy set theory or 
fuzzy technology is exclusively or primarily useful to model uncertainty: 


a) Modeling of uncertainty 


This is certainly the best known and oldest goal. I am not sure, however, whether 
it can (still) be considered to be the most important goal of fuzzy set theory. 
Uncertainty has been a very important topic for several centuries. There are 
numerous methods and theories which claim to be the only proper tool to model 
uncertainties. In general, however, they do not even define sufficiently or only in 
a very specific and limited sense what is meant by “uncertainty”. I believe that 
uncertainty, if considered as a subjective phenomenon, can and ought to be 
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modeled by very different theories, depending on the causes of uncertainty, the 
type and quantity of available information, the requirements of the observer etc. 
In this sense fuzzy set theory is certainly also one of the theories which can be 
used to model specific types of uncertainty under specific types of circumstances. 
It might then compete with other theories, but it might also be the most appro- 
priate way to model this phenomenon for well-specified situations. It would 
certainly exceed the scope of this article to discuss this question in detail here 
[Zimmermann 1997]. 


b) Relaxation 


Classical models and methods are normally based on dual logic. They, therefore, 
distinguish between feasible and infeasible, belonging to a cluster or not, optimal 
or suboptimal etc. Often this view does not capture reality adequately. Fuzzy set 
theory has been used extensively to relax or generalize classical methods from a 
dichotomous to a gradual character. Examples of this are fuzzy mathematical pro- 
gramming [Zimmermann 1996], fuzzy clustering [Bezdek and Pal 1992], fuzzy 
Petri Nets [Lipp et al. 1989], fuzzy multi criteria analysis [Zimmermann 1986]. 


c) Compactification 


Due to the limited capacity of the human short term memory or of technical 
systems it is often not possible to either store all relevant data, or to present 
masses of data to a human observer in such a way, that he or she can perceive 
the information contained in these data. Fuzzy technology has been used to reduce 
the complexity of data to an acceptably degree usually either via linguistic vari- 
ables or via fuzzy data analysis (fuzzy clustering etc.). 


d) Meaning Preserving Reasoning 


Expert system technology has already been used since two decades and has led 
in many cases to disappointment. One of the reasons for this might be, that expert 
systems in their inference engines, when they are based on dual logic, perform 
symbol processing (truth values true or false) rather than knowledge processing. 
In approximate reasoning meanings are attached to words and sentences via lin- 
guistic variables. Inference engines then have to be able to process meaningful 
linguistic expressions, rather than symbols, and arrive at membership functions 
of fuzzy sets, which can then be retranslated into words and sentences via 
linguistic approximation. 


8 FUZZY SET THEORY—AND ITS APPLICATIONS 


e) Efficient Determination of Approximate Solutions 


Already in the 70s Prof. Zadeh expressed his intention to have fuzzy set theory 
considered as a tool to determine approximate solutions of real problems in an 
efficient or affordable way. This goal has never really been achieved successfully. 
In the recent past, however, cases have become known which are very good 
examples for this goal. Bardossy [1996], for instance, showed in the context of 
water flow modeling that it can be much more efficient to use fuzzy rule based 
systems to solve the problems than systems of differential equations. Comparing 
the results achieved by these two alternative approaches showed that the accu- 
racy of the results was almost the same for all practical purposes. This is partic- 
ularly true if one considers the inaccuracies and uncertainties contained in the 
input data. 


It seems desirable that an introductory textbook be available to help students 
get started and find their way around. Obviously, such a textbook cannot cover 
the entire body of the theory in appropriate detail. The present book will there- 
fore proceed as follows: 

Part I of this book, containing chapters 2 to 8, will develop the formal frame- 
work of fuzzy mathematics. Due to space limitations and for didactical reasons, 
two restrictions will be observed: 


1. Topics that are of high mathematical interest but require a very solid math- 
ematical background and those that are not of obvious relevance to applica- 
tions will not be discussed. 

2. Most of the discussion will proceed along the lines of the early concepts of 
fuzzy set theory. At appropriate times, however, the additional potential of 
fuzzy set theory that arises by using other axiomatic frameworks resulting in 
other operators will be indicated or described. The character of these chap- 
ters will obviously have to be formal. 


Part II of the book, chapters 9 to 16, will then survey the most interesting 
applications of fuzzy set theory. At that stage the student should be in a position 
to recognize possible extensions and improvements of the applications presented. 


| FUZZY MATHEMATICS 


This first part of this book is devoted to the formal framework of the theory of 
fuzzy sets. Chapter 2 provides basic definitions of fuzzy sets and algebraic oper- 
ations that will then serve for further considerations. Even though we shall use 
one version of terminology and one set of symbols consistently throughout the 
book, alternative ways of denoting fuzzy sets will be mentioned because they 
have become common. Chapter 3 extends the basic theory of fuzzy sets by intro- 
ducing additional concepts and alternative operators. Chapter 4 is devoted to 
fuzzy measures, measures of fuzziness, and other important measures that are 
needed for applications presented either in Part II of this book or in the second 
volume on decision making in a fuzzy environment. Chapter 5 introduces the 
extension principle, which will be very useful for the following chapters and 
covers fuzzy arithmetic. Chapters 6 and 7 will then treat fuzzy relations, graphs, 
and functions. Chapter 8 focuses on uncertainty modeling and some special 
topics, such as the relationship between fuzzy set theory, probability theory, and 
other classical areas. 


2 FUZZY SETS—BASIC 
DEFINITIONS 


2.1 Basic Definitions 


A classical (crisp) set is normally defined as a collection of elements or objects 
x € X that can be finite, countable, or overcountable. Each single element can 
either belong to or not belong to a set A, A c X. In the former case, the 
statement “x belongs to A” is true, whereas in the latter case this statement is 
false. 

Such a classical set can be described in different ways: one can either enu- 
merate (list) the elements that belong to the set; describe the set analytically, for 
instance, by stating conditions for membership (A = {x|x < 5}); or define the 
member elements by using the characteristic function, in which 1 indicates mem- 
bership and 0 nonmembership. For a fuzzy set, the characteristic function allows 
various degrees of membership for the elements of a given set. 


Definition 2-1 


If X is a collection of objects denoted generically by x, then a fuzzy set A in X is 
a set of ordered pairs: 
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A={(x, a(x) x € X} 


a(x) is called the membership function or grade of membership (also degree of 
compatibility or degree of truth) of x in A that maps X to the membership space 
M (When M contains only the two points 0 and 1, A is nonfuzzy and p(x) is 
identical to the characteristic function of a nonfuzzy set). The range of the mem- 
bership function is a subset of the nonnegative real numbers whose supremum 1s 
finite. Elements with a zero degree of membership are normally not listed. 


Example 2-la 


A realtor wants to classify the house he offers to his clients. One indicator of 
comfort of these houses is the number of bedrooms in it. Let X = {1, 2, 3, 4,..., 
10} be the set of available types of houses described by x = number of bedrooms 
in a house. Then the fuzzy set “comfortable type of house for a four-person 
family” may be described as 


A= {(1, .2), (2, .5), 83, .8), (4, 1), 6, .7), 6, .3)} 
In the literature one finds different ways of denoting fuzzy sets: 

I. A fuzzy set is denoted by an ordered set of pairs, the first element of which 
denotes the element and the second the degree of membership (as in definition 
2-1). 

Example 2-1b 


A = “real numbers considerably larger than 10” 


A={(x, a(x) | xe X} 


A p x <10 
ua(x)= 
, (14(x—-10))"', x>10 


where 


Example 2-Ic 


A = “real numbers close to 10” 
A ={(x,Wa(x)) |ua (x) = (+(x -107)-} 
See figure 2-1. 


2. A fuzzy set is represented solely by stating its membership function [for 
instance, Negoita and Ralescu 1975]. 


FUZZY SETS—BASIC DEFINITIONS 13 


ug (x) 
1 


5 10 15 


Figure 2-1. Real numbers close to 10. 


3 A = pa (o/21 +a (x2)/x0 -+= È pa Œ) 


or fpa (x)/x 


Example 2-Id 
A = “integers close to 10” 


A =0.1/7+0.5/8+ 0.8/9 +1/10+0.8/11+0.5/12 +0.1/13 


Example 2-le 


A = “real numbers close to 10” 


- 1 
A- | —— 
me) 


It has already been mentioned that the membership function is not limited to 
values between 0 and 1. If sup,t14(x) = 1, the fuzzy set A is called normal. A non- 
empty fuzzy set A can always be normalized by dividing u(x) by sup,ta(x): As 
a matter of convenience, we will generally assume that fuzzy sets are normal- 
ized. For the representation of fuzzy sets, we will use the notation 1 illustrated 
in examples 2—1b and 2—Ic, respectively. 

A fuzzy set is obviously a generalization of a classical set and the member- 
ship function a generalization of the characteristic function. Since we are gener- 
ally referring to a universal (crisp) set X, some elements of a fuzzy set may have 
the degree of membership zero. Often it is appropriate to consider those elements 
of the universe that have a nonzero degree of membership in a fuzzy set. 
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Definition 2-2 


The support of a fuzzy set A, S(A), is the crisp set of all x e X such that 
a(x) > 0. 


Example 2-2 
Let us consider example 2—1la again: The support of S(A) = {1, 2, 3, 4, 5, 6}. The 
elements (types of houses) {7, 8, 9, 10} are not part of the support of A! 


A more general and even more useful notion is that of an a-level set. 


Definition 2-3 


The (crisp) set of elements that belong to the fuzzy set A at least to the degree & 
is called the o-level set: 


Ag = {x € X| pax) 2 OF 


Ag = {x € X | wax) > &} is called “strong o-level set” or “strong o-cut.” 


Example 2-3 
We refer again to example 2—1a and list possible a-level sets: 


A, = {1, 2, 3, 4, 5, 6} 


As ={2, 3, 4, 5} 
Ag = {3, 4} 
A; = {4} 


The strong a-level set for & = .8 is A% = {4}. 


Convexity also plays a role in fuzzy set theory. By contrast to classical set theory, 
however, convexity conditions are defined with reference to the membership 
function rather than the support of the fuzzy set. 


Definition 2-4 
A fuzzy set A is convex if 
LUA (Ax; +(1 ~ d)x2) 2 min{p4 (xı), LA (x2 )}, Xis X2 E X, À E€ [0, 1] 


Alternatively, a fuzzy set is convex if all a-level sets are convex. 
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Figure 2—2a. Convex fuzzy set. 


Figure 2—2b. Nonconvex fuzzy set. 


Example 2-4 


Figure 2—2a depicts a convex fuzzy set, whereas figure 2—2b illustrates a non- 
convex fuzzy set. 

One final feature of a fuzzy set, which we will use frequently in later chap- 
ters, is its cardinality or “power” [Zadeh 1981c]. 
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Definition 2-5 


For a finite fuzzy set A, the cardinality IAI is defined as 


Al = $ ua) 


xeX 


-z _IAl | oe 
|All = i 8 called the relative cardinality of A. 


Obviously, the relative cardinality of a fuzzy set depends on the cardinality of the 
universe. So you have to choose the same universe if you want to compare fuzzy 
sets by their relative cardinality. 


Example 2-5 


For the fuzzy set “comfortable type of house for a four-person family” from 
example 2—1a, the cardinality is 


A| = .2+.54+ 8+14+.74+ .3=3.5 


Its relative cardinality is 


The relative cardinality can be interpreted as the fraction of elements of X being 
in A, weighted by their degrees of membership in A. For infinite X, the cardinal- 
ity is defined by |A| = J palx) dx. Of course, |A| does not always exist. 


2.2 Basic Set-Theoretic Operations for Fuzzy Sets 


The membership function is obviously the crucial component of a fuzzy set. It is 
therefore not surprising that operations with fuzzy sets are defined via their mem- 
bership functions. We shall first present the concepts suggested by Zadeh in 1965 
[Zadeh 1965, p. 310]. They constitute a consistent framework for the theory of 
fuzzy sets. They are, however, not the only possible way to extend classical set 
theory consistently. Zadeh and other authors have suggested alternative or addi- 
tional definitions for set-theoretic operations, which will be discussed in chapter 3. 


Definition 2-6 
The membership function ug(x) of the intersection C=AN Bis pointwise defined by 


ue(x) = min{p,4(x),pa(x)}, xe X 
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Definition 2-7 
The membership function s(x) of the union D=AUBis pointwise defined by 


Up(x) = max{u,(x),Ua(x)}, xEeX 


Definition 2-8 


The membership function of the complement of a normalized fuzzy set A, Ug a(x) 
is defined by 


Mea(x) =1-pg(x) xEX 


Example 2-6 


Let A be the fuzzy set “comfortable type of house for a four-person family” from 
example 2-la and B be the fuzzy set “large type of house” defined as 


B={(3, .2), (4, .4), (5, .6), (6, .8), (7, 1), (8, D} 
The intersection C = A N B is then 
C={(3, .2), (4, .4), (5, .6), (6, .3)} 
The union D =A U B is 
D={(, .2), (2, .5), (3, .8), (4, 1), (5, .7), (6, .8), (7, D, (8, D} 
The complement CB, which might be interpreted as “not large type of house,” is 


¢B ={(, 1), (2, 1), (3, .8), (4, .6), (5, .4), (6, .2), (9, 1), (10, D} 


Example 2-7 


Let us assume that 


A = “x is considerable larger than 10,” and 


B = “x is approximately 11,” characterized by 
A={(x, a(x))|x € X} 
where 


(x) te x <10 
(x)= 
Ha (1+ (x-102)" x>10 
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5 10 11 x 


Figure 2-3. Union and intersection of fuzzy sets. 


and 
B={(x, pa (x)|x € X} 
where 
a(x) =(+(x-11)$)7 
Then 


S A for x>10 
hanes 0 for x<10 


(x is considerably larger than 10 and approximately 11) 
Haug (x) = max((1+(x-10)7)',(+(x-11)*)"], xeXx 
Figure 2—3 depicts the above. 


It has already been mentioned that min and max are not the only operators that 
could have been chosen to model the intersection or union, respectively, of fuzzy 
sets. The question arises, why those and not others? Bellman and Giertz addressed 
this question axiomatically in 1973 [Bellman and Giertz 1973, p. 151]. They 
argued from a logical point of view, interpreting the intersection as “logical and,” 
the union as “logical or,” and the fuzzy set A as the statement “The element x 
belongs to set A,” which can be accepted as more or less true. It is very instruc- 
tive to follow their line of argument, which is an excellent example for an 
axiomatic justification of specific mathematical models. We shall therefore sketch 
their reasoning: Consider two statements, S and T, for which the truth values are 
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Us and Ur, respectively, Us, Ur € [0, 1]. The truth value of the “and” and “or” 
combination of these statements, u(S and T) and u(S or T), both from the inter- 
val [0, 1], are interpreted as the values of the membership functions of the inter- 
section and union, respectively, of S and T. We are now looking for two 
real-valued functions f and g such that 


Usanar = f(Us, Ur) 
Usorr =Z(Us, Ur) 


Bellman and Giertz feel that the following restrictions are reasonably imposed 
on f and g: 


i. fand g are nondecreasing and continuous in us and Ur. 
ii. fand g are symmetric, that is, 


fs, Ur) = flr, us) 
g(Us, Ur) = gur, Us) 


iii. f(Us, Us) and g(Us, Hs) are strictly increasing in Ms. 

iv. f(Us, Ur) < min (Us, Ur) and g(Hs, Ur) 2 max (Us, Ur). This implies that accept- 
ing the truth of the statement “S and T” requires more, and accepting the 
truth of the statement “S or T” less than accepting S or T alone as true. 

v. f(1, 1) = 1 and g(0, 0) = 0. 

vi. Logically equivalent statements must have equal truth values, and fuzzy sets 
with the same contents must have the same membership functions, that is, 


S, and (S, or $3) 
is equivalent to 
(S, and S,) or (S, and $3) 


and therefore must be equally true. 


Bellman and Giertz now formalize the above assumptions as follows: Using 
the symbols ^ for “and” (= intersection) and v for “or” (= union), these assump- 
tions amount to the following seven restrictions, to be imposed on the two com- 
mutative (see (11)) and associative (see (vi)) binary compositions ^ and v on the 
closed interval [0, 1], which are mutually distributive (see (vi)) with respect to 
one another. 


l. Hs ^Hr= Ur A Hs 
Hs V Ur = Hr V Us 

2. (Hs A Hr) A Hu = Hs A (Hr A My) 
(Us V Ur) V Wy = Us V (Ur V Ho) 
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3. Us A (Hr V Wy) = (Us A Hr) V (Us A Wo) 
Ms V (Hr A Hu) = (Us V Ur) A (Hs V Hu) 
4. bs A Uy and Hs V uy are continuous and nondecreasing in each component 
5. bs A Ms and Us V Us are strictly increasing in Us (see (ili)) 
6. Us A Ur S min (Us, Ur) 
Hs V Hr 2 max (Ls, Hr) (see (iv)) 
7 lal=l1 
0 v 0 = 0 (see (v)) 


Bellman and Giertz then prove mathematically [see Bellman and Giertz 1973, 
p. 154] that 


Us,r = Minus, Ur) and Hsr = max(us, ur) 


For the complement, it would be reasonable to assume that if statement “S” is 
true, its complement “non S” is false, or if us = 1 then Haons = O and vice versa. 
The function h (as complement in analogy of f and g for intersection and union) 
should also be continuous and monotonically decreasing, and we would like the 
complement of the complement to be the original statement (in order to be in line 
with traditional logic and set theory). These requirements, however, are not 
enough to determine uniquely the mathematical form of the complement. 
Bellman and Giertz require in addition that u¿(1/2) = 1/2. Other assumptions are 
certainly possible and plausible. 


Exercises 


1. Model the following expressions as fuzzy sets: 
a. Large integers 
b. Very small numbers 
c. Medium-sized men 
d. Numbers approximately between 10 and 20 
e. High speeds for racing cars 
2. Determine all o-level sets and all strong a-level sets for the following fuzzy 


sets: 

a. A= {(3, 1), (4, .2), (5, .3), (6, .4), (7, .6), (8, .8), (10, 1), (12, .8), (14, 
6) 

b. B= {(x, pa) = (1 + (œ - 10’) 
for a = .3, .5, .8 


c. C= {(x, Ue(x))|x € R} 
where U¢(x) = 0 for x < 10 
Ue (x) = (1 + (x — 10)°)" for x > 10 
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3. Which of the fuzzy sets of exercise 2 are convex and which are not? 
4. LetX={1,2,..., 10}. Determine the cardinalities and relative cardinalities 
of the following fuzzy sets: 
a. A from exercise 2a 
b. B = {(2, .4), (3, .6), (4, .8), (5, 1), (6, .8), (7, .6), (8, .4)} 
c. C = {(2, .4), (4, .8), (5, 1), (7, .6)} 
5. Determine the intersections and unions of the following fuzzy sets: 
a. The fuzzy sets A, B, and C from exercise 4 
b. B and C from exercise 2 
6. Determine the intersection and the union of the complements of fuzzy sets 
B and C from exercise 4. 


3 EXTENSIONS 


3.1 Types of Fuzzy Sets 


In chapter 2, the basic definition of a fuzzy set was given and the original set- 
theoretic operations were discussed. The membership space was assumed to be 
the space of real numbers, membership functions were crisp functions, and the 
operations corresponded essentially to the operations of dual logic or Boolean 
algebra. 

Different extensions of the basic concept discussed in chapter 2 are possible. 
They may concern the definition of a fuzzy set or they may concern the opera- 
tions with fuzzy sets. With respect to the definition of a fuzzy set, different struc- 
tures may be imposed on the membership space and different assumptions may 
be made concerning the membership function. These extensions will be treated 
in section 3.1. 

It was assumed in chapter 2 that the logical “and” corresponds to the set- 
theoretic intersection, which in turn is modeled by the min-operator. The 
same type of relationship was assumed for the logical “or,” the union, and the 
max-operator. Departing from the well-established systems of dual logic and 
Boolean algebra, alternative and additional definitions for terms such as 
intersection and union, for their interpretation as “and” and “or,” and for their 
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mathematical models can be conceived. These concepts will be discussed in 
section 3.2. 

So far we have considered fuzzy sets with crisply defined membership func- 
tions or degrees of membership. It is doubtful whether, for instance, human beings 
have or can have a crisp image of membership functions in their minds. Zadeh 
[1973a, p. 52] therefore suggested the notion of a fuzzy set whose membership 
function itself is a fuzzy set. If we call fuzzy sets, such as those considered so 
far, type 1 fuzzy sets, then a type 2 fuzzy set can be defined as follows. 


Definition 3-1 


A type 2 fuzzy set is a fuzzy set whose membership values are type 1 fuzzy sets 
on [0, 1]. 


The operations intersection, union, and complement defined so far are no longer 
adequate for type 2 fuzzy sets. We will, however, postpone the discussions for 
adequate operators until section 5.2, that is, until we have presented the exten- 
sion principle, which shall prove very useful for this purpose. By the same token 
by which we introduced type 2 fuzzy sets, it could be argued that there is no 
obvious reason why the membership functions of type 2 fuzzy sets should be 
crisp. A natural extension of these type 2 fuzzy sets is therefore the definition of 
type m fuzzy sets. 


Definition 3-2 


A type m fuzzy set is a fuzzy set in X whose membership values are type m — 1, 
m > 1 fuzzy sets on [0, 1]. 


From a practical point of view, such type m fuzzy sets for large m (even for 
m 2 3) are hard to deal with, and it will be extremely difficult or even impos- 
sible to measure them or to visualize them. We will, therefore, not even try to 
define the usual operations on them. 

There have been other attempts to include vagueness that goes beyond the 
fuzziness of ordinary type 1 fuzzy sets. One example is the “stochastic fuzzy 
model” of Norwich and Turksen [1981, 1984]. Those authors were mainly con- 
cerned with the measurement and the scale level of membership functions. They 
view a fuzzy set as a family of random variables whose density functions are esti- 
mated by that stochasticity [Norwich and Turksen 1984, p. 21]. 

Hirota [1981] also considers fuzzy sets for which the “value of membership 
functions is a random variable.” 
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Definition 3-3 (Hirota 1981, p. 35] 
A probabilistic set A on X is defined by a defining function u4, 


Wa: X XQ3(x,0) >u, lx, @) Ec 


where L,4(x, -) is the (B, B-)-measurable function for each fixed x € X. 

For Hirota, a probabilistic set A with the defining function L,4(x, œ) is con- 
tained in a probabilistic set B with g(x, œ) if for each x e X there exists an E € 
B that satisfies P(E) = 1 and (x, œ) < g(x, œ) for all œ € E. (Q, B, P) is called 
the parameter space. 


One of the main advantages of the notion of probabilistic sets in modeling fuzzy 
and stochastic features of a system is asserted to be the applicability of moment 
analysis, that is, the possibility of computing moments such as expectation and 
variance. Figure 3—1 indicates the difference between the appearance of fuzzy 
sets and probabilistic sets [Hirota 1981, p. 33]. Of course, the mathematical pro- 
perties of probabilistic sets differ from those of fuzzy sets, and so do the math- 
ematical models for intersection, union, and so on. 

A more general definition of a fuzzy set than is given in definition 2—1 is that 
of an L-fuzzy set [Goguen 1967; De Luca and Termini 1972]. In contrast to the 
above definition, the membership function of an L-fuzzy set maps into a partially 
ordered set, L. Since the interval [0, 1] is a poset (partially ordered set), the fuzzy 
set in definition 2—1 is a special L-fuzzy set. 

Further attempts at representing vague and uncertain data with different types 
of fuzzy sets were made by Atanassov and Stoeva [Atanassov and Stoeva 1983; 
Atanassov 1986], who defined a generalization of the notion of fuzzy sets— 
the intuitonistic fuzzy sets—and by Pawlak [Pawlak 1982], who developed the 
theory of rough sets, where grades of membership are expressed by a concept of 
approximation. 


Definition 3—4 [Atanassov and Stoeva 1983] 


Given an underlying set X of objects, an intuitonistic fuzzy set (IFS) A is a set of 
ordered triples, 


A={(x, ua (x), v4(x))lx € X} 


where u,(x) and v,(x) are functions mapping from X into [0, 1]. For each x € X, 
ua(x) represents the degree of membership of the element x to the subset A of X, 
and v,(x) gives the degree of nonmembership. For the functions u4(x) and v(x) 
mapping into [0, 1], the condition 0 < p4(x) + v(x) < 1 holds. 
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Ordinary fuzzy sets over X may be viewed as special intuitonistic fuzzy sets 
with the nonmembership function v4(x) = 1 — U,4(x). In the same way as fuzzy 
sets, intuitonistic L-fuzzy sets were defined by mapping the membership func- 
tions into a partially ordered set L [Atanassov and Stoeva 1984]. 


Definition 3-5 [Pawlak 1985, p. 99; Pawlak et al. 1988] 


Let U denote a set of objects called universe and let R c U x U be an equiva- 
lence relation on U. The pair A = (U, R) is called an approximation space. For u, 
v€ U and (u, v) € R, u and v belong to the same equivalence class, and we say 
that they are indistinguishable in A. Therefore the relation R is called an indis- 
cernibility relation. Let [x] denote an equivalence class (elementary set of A) of 
R containing element x; then lower and upper approximations for a subset X c 
U in A—denoted A(X) and A(X), respectively—are defined as follows: 


A(X) = {x € Ullx]k € X} 
A(X) ={x e Ux], X #0} 


If an object x belongs to the lower approximation space of X in A, then “x surely 
belongs to X in A,” x e A(X) means that “x possibly belongs to X in A.” 

For the subset X c U representing a concept of interest, the approximation 
space A = (U, R) can be characterized by three distinct regions of X in A: the so- 
called positive region A(X), the boundary region A(X) — A(X), and the negative 
region U — A(X). 

The characterization of objects in X by the indiscernibility relation R is not 
precise enough if the boundary region A(X) — A(X) is not empty. For this case it 
may be impossible to say whether an object belongs to X or not, and so the set 
X is said to be nondefinable in A, and X is a rough set. 


Pawlak [1985] shows that the concept of approximation given by the equivalence 
relation R and the approximation space may not, in general, be replaced by a 
membership function similar to that introduced by Zadeh. 

In order to take probabilistic informations crucial to nondeterministic classi- 
fication problems into account, a natural probabilistic extension of the rough-set 
model has been proposed [Pawlak et al. 1988]. 


3.2 Further Operations on Fuzzy Sets 


For the time being we return to ordinary fuzzy sets (type 1 fuzzy sets) and con- 
sider additional operations on them that have been defined in the literature and 
that will be useful or even necessary for later chapters. 
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3.2.1 Algebraic Operations 


Definition 3-6 


The Cartesian product of fuzzy sets 1s defined as follows: Let A), oa. An be fuzzy 
sets in X;,..., X,. The Cartesian product is then a fuzzy set in the product space 
X, X... X X, with the membership function 


Wear An) (x) = mini}, (x; Ix = (x gece Xn ), Xi E X;} 


Definition 3-7 
The mth power of a fuzzy set A is a fuzzy set with the membership function 
wn (x) = [pa], xeX 


Additional algebraic operations are defined as follows: 


Definition 3-8 
The algebraic sum (probabilistic sum) Č = A + B is defined as 
Č = {(x, Wasa(x))Ix € X} 


where 


Wase(x) = ual) +ualx)- ual): ual) 


Definition 3-9 
The bounded sum C = A © B is defined as 

C = {(x, Wies(x))Ix € X} 
where 


Waea(x) = min{1, ya (x)+ua(x)} 


Definition 3—10 
The bounded difference C = A © B is defined as 
C= {(x, Waa (x))Ix € X} 


where 
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Maca (x) = max{0, Wa(x) +a (x) — 1} 


Definition 3-11 
The algebraic product of two fuzzy sets C=A -B is defined as 
C= {(x,ba(x)-ba(x))Ix © X} 


Example 3-1 


Let A(x) = {(3, .5), (5, D, (7, .6)} 
B(x) = {(3, 1), (5, .6)} 
The above definitions are then illustrated by the following results: 


Ax B ={[(3;3), 5], [(5; 3), 1], [(7; 3), 6] 
[(3; 5), 5], [(5; 5), .6], [(7; 5), .6]} 
2 ={(3, .25), (5, 1), (7, .36)} 
+B ={(3,1),(5, 1), (7, 6} 
B ={(, 1), (5, 1), (7, .6)} 
B = {(3, .5), (5, .6)} 
-B ={(3,.5), (5, .6)} 


3.2.2 Set-Theoretic Operations 


In chapter 2 the intersection of fuzzy sets, interpreted as the logical “and,” was 
modeled as the min-operator and the union, interpreted as “or,” as the max- 
operator. Other operators have also been suggested. These suggestions vary 
with respect to the generality or adaptibility of the operators as well as to the 
degree to which and how they are justified. Justification ranges from intuitive 
argumentation to empirical or axiomatic justification. Adaptability ranges 
from uniquely defined (for example, nonadaptable) concepts via parameterized 
“families” of operators to general classes of operators that satisfy certain 
properties. 

We shall investigate the two basic classes of operators: operators for the inter- 
section and union of fuzzy sets—treferred to as triangular norms and conorms— 
and the class of averaging operators, which model connectives for fuzzy sets 
between t-norms and t-conorms. Each class contains parameterized as well as 
nonparameterized operators. 
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t-norms. T-norms were initiated in 1942 with the paper “Statistical metrics” 
[Menger 1942]. Menger intended to construct metric spaces where probability 
distributions rather than numbers are used in order to describe the distance 
between two elements in the respective space. Berthold Schweizer and Abe Sklar 
[Schweizer and Sklar 1961] provided the axioms of t-norms as they are used 
today. 

The mathematical aspects of t-norms are excellently presented in the book by 
Klement, Mesiar and Pap [Klement et al. 2000]. The use of t-norms and t-conorms 
for modeling the intersection and union of fuzzy sets goes back to the 70s, see 
e.g. [Kruse et al. 1994]. Another source is basic psycho-linguistic research that 
tried to model quantitatively the linguistic “and” and “or” [Zimmermann and 
Zysno 1980, 1982, 1983, Thole, Zimmermann and Zysno 1979]. In the follow- 
ing we shall concentrate on those t-norms and t-conorms which are most common 
in fuzzy set theory. For mathematical derivations, proofs and other t-norms the 
reader is referred to the above-mentioned book by [Klement et al. 1994]. 


Let us first turn to basic definitions: 


Definition 3-12 [Dubois and Prade 1980a, p. 17] 


t-norms are two-valued functions from [0, 1] x [0, 1] that satisfy the following 
conditions: 


1. ¢(0, 0) = 0; (us), D = 101, HA) = HAG), xex 
2. tua), Ha) < He), HAO) 


if yi(x) < We(x) and pg(x) < Up(x) (monotonicity) 
3. tax), Max) = tusx), ax) (commutativity) 
4. tua), tax), We(x))) = tear), War), ue) (associativity) 


The functions ¢ define a general class of intersection operators for fuzzy sets. The 
operators belonging to this class of t-norms are, in particular, associative (see 
condition 4), and therefore it is possible to compute the membership values for 
the intersection of more than two fuzzy sets by recursively applying a t-norm 
operator [Bonissone and Decker 1986, p. 220]. 


t-conorms (or s-norms). For the union of fuzzy sets, the max-operator, the 
algebraic sum [Zadeh 1965], and the “bold union” [Giles 1976]—modeled by the 
“bounded sum”—have been suggested. 

Corresponding to the class of intersection operators, a general class of aggre- 
gation operators for the union of fuzzy sets called triangular conorms or t- 
conorms (sometimes referred to as s-norms) is defined analogously [Dubois and 
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Prade 1985, p. 90; Mizumoto 1989, p. 221]. The max-operator, algebraic sum, 
and bounded sum considered above belong to this class. 


Definition 3—13 [Dubois and Prade 1985, p. 90] 


t-conorms or s-norms are associative, commutative, and monotonic two-placed 
functions s that map from [0, 1] x [0, 1] into [0, 1]. These properties are formu- 
lated with the following conditions: 


l. s(1, 1) = 1; s(ui(x), 0) = sO, wae) = Hite), xex 
2. s(a), Ha(x)) < S(He(x), Ws)) 


if a(x) < Met) and a(x) < Way) (monotonicity) 
3. s(a), Ma(x)) = s(x), pax) (commutativity) 
4. s(a), 5(Ma(x), He())) = s(s(ua x), Us), ue) (associativity) 


t-norms and f-conorms are related in a sense of logical duality. Alsina [Alsina 
1985] defined a t-conorm as a two-placed function s mapping from [0, 1] x 
[0, 1] in [0, 1] such that the function t, defined as 


t(u 4 (x), ual) =1-s0-ua(x), 1 -u8 (x)) 


is a t-norm. So any t-conorm s can be generated from a t-norm t through this 
transformation. More generally, Bonissone and Decker [1986] showed that for 
suitable negation operators like the complement operator for fuzzy sets—defined 
as n(U4(x)) = 1 — u; (x) (see chapter 2)—-pairs of t-norms ¢ and t-conorms s satisfy 
the following generalization of DeMorgan’s law [Bonissone and Decker 1986, 
p. 220]: 


s(a (x), Wa(x)) = n(t(n(u a (x)), n(ug(x)))) and 
t(ua(x), Wa(x)) =n(s(n(ua(x)), nus (x))), xex 


Typical dual pairs of nonparameterized t-norms and t-conorms are compiled 
below [Bonissone and Decker 1986, p. 221; Mizumoto 1989, p. 220]: 


l n _ [minįpa (x), ya (x)} if max{ua (x), Wa(x)} = 1 drastic 
by (Hax), Hal) = to otherwise product 
poy. fMaxiba(x), ya (x)} if minita (x), ua (x)} = 0 drastic 
s» (a(x), Hala) = f otherwise sum 
ty (Wa (x), ua(x)) = max{0, ua (x)+u5(x)—1} bounded 


difference 
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sı(ua (x), Wa (x)) = min{1, ya (x) + ua (x)} bounded 
sum 
i EL Wa (x): We (x) Einstein 
ty s(ua(x), a(x) = > —Mas(x) tu s(x)—us(x) ula) ~[pa(x) + p(x) —pa(e) Ba] product 
l l — pax) +pg(x) Einstein 
S\.5(Ua(x), Wa(x)) = ltp uz) uala) sum 
to (Wa(x), Wa (x) = pax) - Wala) algebraic 
product 
S2(U4(x), Wa (x)) =p (x) + a(x) — pa (x) Wa) algebraic 
sum 
i i B Wa (x)- ug(x) Hamacher 
fasta RO) = T aa- Ma) product 
. i _ a(x) +g (x) — 24 (x) . u(x) Hamacher 
S2s (a(x), ua (x)) = ~oa ua) uala) sum 
t3(Wa(x), ua (x)) = min{u a(x), ua (x)} minimum 
s3(Ma (x), ua (x)) = max{p a (x), ya (x)} maximum 


These operators are ordered as follows: 


tyStiStsSbh<sbs<t 


S3 S85 S58. Š Sis S S} < Sy 


We notice that this order implies that for any fuzzy sets A and B in X with mem- 
bership values between 0 and 1, any intersection operator that is a t-norm is 
bounded by the min-operator and the operator t„. A t-conorm is bounded by the 
max-operator and the operator s,,, respectively [Dubois and Prade 1982a, p. 42]: 


ty (Wa (x), Wa(x)) < tua (x), Wa(x)) < min{p,4 (x), ua (x)} 
max{H4(x), ua (x)} < s(a (x), ua (x)) < 5, (ua (x), a(x), xex 


It may be desirable to extend the range of the previously described operators in 
order to adapt them to the context in which they are used. To this end, different 
authors suggested parameterized families of t-norms and t-conorms, often main- 
taining the associativity property. 

For illustration purposes, we review some interesting parameterized operators. 
Some of these operators and their equivalence to the logical “and” and “or,” 
respectively, have been justified axiomatically. We shall sketch the axioms on 
which the Hamacher-operator rests in order to give the reader the opportunity to 
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compare the axiomatic system of Bellman and Giertz (min/max) on the one hand 
with that of the Hamacher-operator (which is essentially a family of product oper- 
ators) on the other. 


Definition 3—14 [Hamacher 1978] 
The intersection of two fuzzy sets A and B is defined as 
AN B={(x, Wana (x))|x € X} 
where 
[a (x) a(x) 
y+- ya) + hax) — pa) a(x)’ 


Hamacher wants to derive a mathematical model for the “and” operator. His basic 
axioms are as follows: 


Mang(x) = y20 


Al. The operator ^ is associative, that is, AA (BAC )= (A A B) AC. 
A2. The operator ^ is continuous. 
A3. The operator ^ is injective in each argument, that is, 


(An B)=(AAC)=> B=C 
(An B)=(CA B) =C 
(this is the essential difference between the Hamacher-operator and the 


Bellman—Giertz axioms). 
A4. wax) = 1 > WA, 4) l 


He then proves that a function f: R — [0, 1] exists with 
Hal) = f(f (a(x) f (Ua) 


If f is a rational function in u4(x) and uș(x), then the only possible operator 
is that shown in definition 3-14. (For y = 1, this reduces to the algebraic 
product!) 

Notice that the Hamacher-operator is the only H-strict t-norm that can be 
expressed as a rational function [Mizumoto 1989, p. 223]. 


Definition 3-15 [Hamacher 1978] 
The union of two fuzzy sets A and B is defined as 


AU B={(x, uau x € X} 


34 FUZZY SET THEORY—AND ITS APPLICATIONS 


where 
(y -Duals +a) tps) , 
1+ yual(x)us(x) i 


For y’ = 0 the Hamacher-union-operator reduces to the algebraic sum. 
Yager [1980] defined another triangular family of operators. 


Haug (x) = >-] 


Definition 3-16 [Yager 1980] 
The intersection of fuzzy sets A and B is defined as 
AN B= {(x, wana (x))|x € X} 


where 


. 1/p 
uaaa) =1—min}1, (pay +a’) h, p2 
The union of fuzzy sets is defined as 
AU B = {(x, paua (0x € X} 


where 


Maua(x) = mini, (ua (x)” tua’) h p21 


His intersection-operator converges to the min-operator for p > œ and his union 
operator to the max-operator for p > œ. 

For p = 1 the Yager-intersection becomes the “bold-intersection” of definition 
3—10. The union operator converges to the maximum-operator for p > œ and 
to the bold union for p = 1. Both operators satisfy the DeMorgan laws and 
are commutative, associative for all p, and monotonically nondecreasing in u(x); 
they also include the classical cases of dual logic. They are, however, not 
distributive. 

Dubois and Prade [1980c, 1982a] also proposed a commutative and associa- 
tive parameterized family of aggregation operators: 


Definition 3—17 [Dubois and Prade 1980c, 1982a] 
The intersection of two fuzzy sets A and B is defined as 
AN B={(x, pans (x)lx € X} 


where 
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a(x): a(x) 


mamou SEON 


Wang(x) = 


This intersection-operator is decreasing with respect to œ and lies between min 
{u4(x), Ha(x)} (which is the resulting operation for œ = 0) and the algebraic 
product u4(x)-Us(x) (for œ = 1). The parameter & is a kind of threshold, since the 
following relationships hold for the defined intersection operation [Dubois and 
Prade 1982a, p. 47]: 


Hang(x)= min{u,a(x),us(x)} for a(x), a(x) elo, 1] 
[Lg (x)- a(x) 


q for Mā (x), Wa (x) E [0, a] 


Mang(x) = 


Definition 3-18 [Dubois and Prade 1980c, 1982a] 
For the union of two fuzzy sets A and B, defined as 
AU B= {(x, Waua(x))lx € X} 
Dubois and Prade suggested the following operation, where a e [0, 1]: 


paalo) = BAC + Hal) = Ha) pal) -minihi (a), uala), (l-0) 
“ max{(1 -p4 (x)), (1—pa(x)), a} 


All the operators mentioned so far include the case of dual logic as a special 
case. The question may arise: Why are there unique definitions for intersection 
(= and) and union (= or) in dual logic and traditional set theory and so many sug- 
gested definitions in fuzzy set theory? The answer is simply that many operators 
(for instance, product and min-operator) perform in exactly the same way if the 
degrees of membership are restricted to the values 0 or 1. If this restriction is no 
longer required, the operators lead to different results. 

This triggers yet another question: Are the only ways to “combine” or aggre- 
gate fuzzy sets the intersection or union—or the logical “and” or “or’”—respec- 
tively? Or are there other possibilities of aggregation? The answer to this 
latter question is definitely yes. There are other ways of combining fuzzy sets 
and fuzzy statements; “and” and “or” are only limiting special cases. General- 
ized models for the logical “and” and “or” are given by the “fuzzy and” and 
“fuzzy or’ [Werners 1984]. Furthermore, a number of authors have suggested 
general connectives, which are (so far) of particular importance for decision 
analysis and for other applications of fuzzy set theory. These operators are general 
in the sense that they do not distinguish between the intersection and union of 
fuzzy sets. 
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Here we shall only mention some of these general connectives. A detailed 
discussion of them and the description of still others can be found in volume 
2 in the context of decision making in fuzzy environments. 


Averaging Operators. A straightforward approach for aggregating fuzzy sets 
(for instance, in the context of decision making) would be to use the aggregating 
procedures frequently used in utility theory or multicriteria decision theory. These 
procedures realize the idea of trade-offs between conflicting goals when com- 
pensation is allowed, and the resulting trade-offs lie between the most optimistic 
lower bound and the most pessimistic upper bound, that is, they map between the 
minimum and the maximum degree of membership of the aggregated sets. There- 
fore they are called averaging operators. Operators such as the weighted and 
unweighted arithmetic or geometric mean are examples of nonparametric aver- 
aging operators. In fact, they are adequate models for human aggregation pro- 
cedures in decision environments and have empirically performed quite well 
[Thole, Zimmermann, and Zysno 1979]. Procedures and results of empirical 
research done in the context of human decision making are investigated in section 
14.3. 

The fuzzy aggregation operators “fuzzy and” and “fuzzy or’ suggested by 
Werners [1984] combine the minimum and maximum operator, respectively, with 
the arithmetic mean. The combination of these operators leads to very good 
results with respect to empirical data [Zimmermann and Zysno 1983] and allows 
compensation between the membership values of the aggregated sets. 


Definition 3—19 [Werners 1988, p. 297] 
The “fuzzy and” operator is defined as 


; (1 — (x) +(x) 
ana (Ua x), Ua (x) = y. min {a(x pao} RAP EA 

xe X, ye [0, 1] 
The “fuzzy or” operator is defined as 


Uor (ualx), a(x) =y: mini; (x), u(x) es 
xe X,ye [0, 1] 


The parameter y indicates the degree of nearness to the strict logical meaning of 
“and” and “or,” respectively. For y = 1, the “fuzzy and” becomes the minimum 
operator, and the “fuzzy or” reduces to the maximum operator. y = 0 yields for 
both the arithmetic mean. 
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Additional averaging aggregation procedures are symmetric summation opera- 
tors, which, like the arithmetic or geometric mean operators, indicate some degree 
of compensation but in contrast to the latter are not associative. Examples of sym- 
metric summation operators are the operators M,, M3, and N,, N2, known as sym- 
metric summations and symmetric differences, respectively. Here the aggregation 
of two fuzzy sets A and B is pointwise defined as follows: 


bg (x) +a (x) — a(x) a(x) 
1+pg(x)+ba(x) — 2p4(x)- Wax) 
[4 (x)- par) 
1+pa(x)—ba(x) +24 (x)- a(x) 
max{p a(x), pa (x)} 
1+|ua(x)-pa(w)l 
min{p4 (x), ua (x)} 
1+ua(x)- ual) 


A detailed description of the properties of nonparametric averaging operators 
is reported by Dubois and Prade [1984]. For further details of symmetric sum- 
mation operators, the reader is referred to Silvert [1979]. 

The above-mentioned averaging operators indicate a “fix” compensation 
between the logical “and” and the logical “or.” In order to describe a variety of 
phenomena in decision situations, several operators with different compensations 
are necessary. An operator that is more general in the sense that the compen- 
sation between intersection and union is expressed by a parameter Y was sug- 
gested and empirically tested by Zimmermann and Zysno [1980] under the name 
“compensatory and.” 


M: (ua (x), u(x) = 
M: (a(x), ug (x)) = 
N, (ua (x), pa (x)) = 


Np (Wa(x), Ha(x)) = 


Definition 3-20 [Zimmermann and Zysno 1980] 
The “compensatory and” operator is defined as follows: 


m 


m (1-y) Y 
M Ai, comp (xX) = (Éim) € -Į [a - 1.00) , xEX,0<y<l 


i=] 


This ““y-operator” is obviously a combination of the algebraic product (modeling 
the logical “and’’) and the algebraic sum (modeling the “or’’). It is pointwise injec- 
tive (except at zero and one), continuous, monotonous, and commutative. It also 
satisfies the DeMorgan laws and is in accordance with the truth tables of dual 
logic. The parameter indicates where the actual operator is located between the 
logical “and” and “or.” 
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Other operators following the idea of parameterized compensation are defined 
by taking linear convex combinations of noncompensatory operators modeling 
the logical “and” and “or.” The aggregation of two fuzzy sets A and B by the 
convex combination between the min- and max-operator is defined as 


Mi (Wa (x), a(x) =y: minua (x), u a(x) +- y): max{p a(x), wa (x)} 
ye [0, 1] 


Combining the algebraic product and algebraic sum, we obtain the following 
operation: 


2 (ua(x), Wax) = yualx)- Wale) +A- y) [ba e) + a(x) -ual ua] 
ye [0, 1] 


This class of operators is again in accordance with the dual logic truth tables. But 
Zimmermann and Zysno showed that the “compensatory and” operator is more 
adequate in human decision making than are these operators [Zimmermann and 
Zysno 1980, p. 50]. 

The relationships between different aggregation operators for aggregating two 
fuzzy sets A and B with respect to the three classes of t-norms, ft-conorms, and 
averaging operators are represented in figure 3-2. 

A taxonomy with respect to the compensatory property of distinguishing oper- 
ators, which differentiate between the intersection and union of fuzzy sets, and 
general operators is presented in table 3-1. Table 3-2 summarizes the classes of 
aggregation operators for fuzzy sets reported in this chapter and compiles some 
references. Table 3—3 represents the relationship between parameterized families 


LA) 
Hea) 


1 
t-conorms H69 


averaging averaging 
operators operators 


t-norms Hat) 


Figure 3-2. Mapping of t-norms, t-conorms, and averaging operators. 
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Table 3-1. Classification of compensatory and noncompensatory operators. 


Distinguishing General 
operators operators 
Compensatory fuzzy and compensatory and 
fuzzy or convex combinations of min and max 


symmetric summations 
mean operators 
Noncompensatory t-norms 
t-conorms 
min 
max 


of operators and the presented t-norms and t-conorms with respect to special 
values of their parameters. 


Ordered Weighted Averaging (OWA) Operators. Yager [Yager 1988] intro- 
duced a family of aggregation operators, so-called OWA operators, which belong 
into the class of mean operators. They are especially suited—and intended—to 
aggregations of (weighted) criteria in multi criteria decision making that will be 
discussed in chapter 13. Yager uses the same idea that is behind definition 3—20, 
i.e. that for the aggregation of criteria an “operator” between the “logical and” 
and the “logical inclusive or” seems to be suitable. By contrast to the “compen- 
satory and”, defined in 3—20, Yager derives his suggestions by formal arguments 
rather than by scientific empirical tests: 


Definition 3-21 [Yager 1993] 
An OWA-operator is defined as follows: 


Howa (x) = », w jf; (x) 


where: w = {W), W2, . . . Wa} is a vector of weights w; with 
w; €[0,1] and 


yw; =] 


u(x) is the j"" largest membership value for an element x for which the (aggre- 
gated) degree of membership shall be determined. 

The rationale behind this operator is again the observation, that for an 
“and” aggregation (modeled i.e. by the min-operator) the smallest degree of 
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membership is crucial while for an “or” aggregation (modeled by “max’’) the 
largest degree of membership of an element in all fuzzy sets is to be aggregated. 
Therefore, a basic aspect of this operator is the re-ordering step. In particular, 
the degree of membership of an element in a fuzzy set is not associated with a 
particular weight. 
Rather a weight is associated with a particular ordered position of a degree of 
membership in the ordered set of relevant degrees of membership. 


Example 3-2 


Let us consider the aggregation of the degrees of membership of an element x, 
which is contained in 10 fuzzy sets (x) to Lyo(X). 
The OWA-weighting vector be: 


w = (0.3, 0.20, 0.15, 0.12, 0.06, 0.05, 0.04, 0.03, 0.02, 0.02) 
The degrees of membership of x in the 10 fuzzy sets ,(x) are: 
u = (0.2, 0.3, 0.5, 0.8, 1,0.6, 0.5, 0.4, 0.3, 0.2) 
Recording the (x) according to their values yields: 
u’ = (1, 0.8, 0.6, 0.5, 0.5, 0.4, 0.3, 0.3, 0.2, 0.2) 


Howa (x) = (0.3) (1) + (0.2) (0.8) + (0.15) (0.6) + (0.12) (0.5) + (0.06) (0.5) 
+ (0.05) (0.4) + (0.04) (0.3) + (0.03) (0.03) + (0.02) (0.2) + (0.02) (0.2) 
= 0.689 


Special vectors w correspond to typical aggregation operators. For instance: 
w =(1, 0, 0,..., 0) = max-operator 


w =(0,0,..., 0, 1) = min-operator 


1 1 1 1 
wW (4E, t) Soni) arith, mean 
n i 


n n n 


Yager also defines a number of measures, two of which quantify the position of 
this operator between the “logical and” and the “logical or”: 


He defines the “orness” as 
1 n 
orness(w) = aT » (n—i)w; 
n=l iz 


and ‘“‘andness” as 
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andness(w) = 1 — orness(w) 


and suggests that values of less than .5 for these measures indicate a bias to “and” 
or “or” respectively (compare with definition 3—19!). 
Yager also proposes other families of OWA-operators in [Yager 1993]. 
Finally it shall be mentioned that some authors also suggest the fuzzy integral 
as aggregation operator [Grabisch 1998]. 


3.2.3 Criteria for Selecting Appropriate Aggregation Operators 


The variety of operators for the aggregation of fuzzy sets might be confusing and 
might make it difficult to decide which one to use in a specific model or situa- 
tion. Which rules can be used for such a decision? 

The following eight important criteria according to which operators can be 
classified are not quite disjunct; hopefully they may be helpful in selecting the 
appropriate connective. 


1. Axiomatic Strength. We have listed the axioms that Bellman-Giertz and 
Hamacher, respectively, wanted their operators to satisfy. Obviously, every- 
thing else being equal, an operator is the better the less limiting the axioms 
are it satisfies. 

2. Empirical Fit. If fuzzy set theory is used as a modeling language for real 
situations or systems, it is not only important that the operators satisfy certain 
axioms or have certain formal qualities (such as associativity, commutativ- 
ity), which are certainly of importance from a mathematical point of view, 
but also the operators must be appropriate models of real-system behavior; 
and this can normally be proven only by empirical testing. 

3. Adaptability. It is rather unlikely that the type of aggregation is indepen- 
dent of the context and semantic interpretation, that is, whether the aggre- 
gation of fuzzy sets models a human decision, a fuzzy controller, a medical 
diagnostic system, or a specific inference rule in fuzzy logic. If one wants to 
use a very small number of operators to model many situations, then these 
operators have to be adaptable to the specific context. This can, for instance, 
be achieved by parameterization. Thus min- and max-operators cannot be 
adapted at all. They are acceptable in situations in which they fit and under 
no other circumstances. (Of course, they have other advantages, such as 
numerical efficiency.) By contrast, Yager’s operators or the y-operator can be 
adapted to certain contexts by setting the p’s or Y's appropriately, and OWA 
operators by chaosing appropriate weight vectors. 
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Computational Efficiency. If one compares the min-operator with, for 
instance, Yager’s intersection operator or the y-operator, it becomes quite 
obvious that the latter two require considerably more computational effort 
than the former. In practice, this might be quite important, in particular when 
large problems have to be solved. 

Compensation. The logical “and” does not allow for compensation at all; 
that is, an element of the intersection of two sets cannot compensate for a 
low degree of belonging to one of the intersected sets by a higher degree of 
belonging to another of them. In (dual) logic, one cannot compensate by the 
higher truth of one statement for the lower truth of another statement when 
combining them by “and.” By compensation, in the context of aggregation 
operators for fuzzy sets, we mean the following: Given that the degree of 
membership to the aggregated fuzzy set is 


Wace (x) = fU), Wala, )) =k 


fis compensatory if [Ha,.(x,) =k is obtainable for a different u(x) by a change 
in u(x). Thus the min-operator is not compensatory, while the product oper- 
ator, the y-operator, and so forth, are. 

Range of Compensation. If one would use a convex combination of min- 
and max-operator, a compensation could obviously occur in the range 
between min and max. The product operator allows compensation in the open 
interval (0, 1). In general, the larger the range of compensation, the better 
the compensatory operator. 

Aggregating Behavior. If one considers normal or subnormal fuzzy sets, 
the degree of membership in the aggregated set depends very frequently on 
the number of sets combined. If one combines fuzzy sets by the product oper- 
ator, for instance, each additional fuzzy set “added” will normally decrease 
the resulting aggregate degrees of membership. This might be a desirable 
feature; it might, however, also be inadequate. Goguen, for instance, argues 
that for formal reasons the resulting degree of membership should be 
nonincreasing [Goguen 1967]. 

Required Scale Level of Membership Functions. The scale level 
(nominal, interval, ratio, or absolute) on which membership information can 
be obtained depends on a number of factors. Different operators may require 
different scale levels of membership information to be admissible. (For 
instance, the min-operator is still admissible for ordinal information, while 
the product operator, strictly speaking, is not!) In general, again all else being 
equal, the operator that requires the lowest scale level is the most preferable 
from the point of view of information gathering. 
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Exercises 


l. 


The product and the bounded difference have both been suggested as models 
for the intersection. Compute the intersection of fuzzy sets B and C from 
exercise 4 of chapter 2 and compare the three alternative models for the inter- 
section: Minimum, product, and bounded difference. 

The bounded sum and the algebraic sum have been suggested as alternative 
models for the union of fuzzy sets. Compute the union of the fuzzy sets 
B and C of exercise 4 of chapter 2 using the above-mentioned models, and 
compare the result with the result of exercise 4 of chapter 2. 

Determine the intersection of B and C in exercise 4 of chapter 2 by 
using the 

a. Hamacher operator with y = .25; .5; .75 

b. Yager operator with p = 1, 5, 10. 

Which of the intersection operators mentioned in chapter 3 are compensatory 
and which not? Are the “compensatory” operators compensatory for the 
entire range [0, 1] and for the entire domain of their parameters (Y, p, etc.)? 
If not, what are the limits of compensation? 

Prove that the following properties are satisfied by Yager’s union operator: 
a. piul) = uix) for psx) = 

b. Hava) =1 for ps) =1 

c. U4uat®) 2 a(x) for par) = plx) 

d. For p — 0, the Yager union operator reduces to s„ (drastic sum). 

Show for the parameterized families of fuzzy union defined by Hamacher, 
Yager, and Dubois that the defining functions of these operators decrease with 
any increase in the parameter. 


4 FUZZY MEASURES AND 
MEASURES OF FUZZINESS 


4.1 Fuzzy Measures 


In order to prevent confusion about fuzzy measures and measures of fuzziness, 
we shall first briefly describe the meaning and features of fuzzy measures. In the 
late 1970s, Sugeno defined a fuzzy measure as follows: 

Sugeno [1977]: B is a Borel field of the arbitrary set (universe) X. 


Definition 4-1 


A set function g defined on B that has the following properties is called a fuzzy 
measure: 


1. g(0)=0, g(X) =1. 
2. IfA, Be BandA cB, then g(A) < g(B). 
3. IfA c B,A, CA, C..., then lim g(A,) = g(lim A,). 


Sugeno’s measure differs from the classical measure essentially by relaxing the 
additivity property [Murofushi and Sugeno 1989, p. 201]. A different approach, 
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however, is used by Klement and Schwyhla [1982]. The interested reader is 
referred to their article. 

Banon [1981] shows that very many measures with finite universe, such as 
probability measures, belief functions, plausibility measures, and so on, are fuzzy 
measures in the sense of Sugeno. For this book, one measure—possibility—is of 
particular interest [see Dubois and Prade 1988a, p. 7]. 

In the framework of fuzzy set theory, Zadeh introduced the notion of a possi- 
bility distribution and the concept of a possibility measure, which is a special 
type of the fuzzy measure proposed by Sugeno. A possibility measure is defined 
as follows [Zadeh 1978; Higashi and Klir 1982]: 


Definition 4-2 


Let P(X) be the power set of a set X. 
A possibility measure is a function II: P(X) — [0, 1] with the properties 


1. TIO) =0, W(X) =1 
2. Ac B= IIA) < NCB) 
3.  TI(U 4) = supTI(A;) with an index set I. 
iel iel 

It can be uniquely determined by a possibility distribution function f: X — [0, 1] 
by II(A) = sup,.4 f(x), A C X. It follows directly that fis defined by Rx) = TI({xþV, 
e X [Klir and Folger 1988, p. 122]. 

A possibility is not always a fuzzy measure [Puri and Ralescu 1982]. It is, 
however, a fuzzy measure if X is finite and if the possibility distribution is 
normal—that is, a mapping into [0, 1]. 


Example 4-1 


Let X = {0, 1,..., 10}. 
II({x}): = Possibility that x is close to 8. 





of o}o}ofofals} stil sis 


II(A): = Possibility that A contains an integer close to 8. 
Ac X »II(A) = sup I({x}) 


XEÁ 


FUZZY MEASURES AND MEASURES OF FUZZINESS 49 
For A = {2, 5, 9} we compute: 


TI(A) = sup Mx} 
= sup{I1({2}), NGS), H9} 
= sup{0, .1, .8} 


= .8 


4.2 Measures of Fuzziness 


Measures of fuzziness, in contrast to fuzzy measures, try to indicate the degree 
of fuzziness of a fuzzy set. A number of approaches to this end have become 
known. Some authors, strongly influenced by the Shannon entropy as a measure 
of information, and following de Luca and Termini [1972], consider a measure 
of fuzziness as a mapping d from the power set P(X) to [0, +c] that satisfies a 
number of conditions. Others [Kaufmann 1975] suggested an index of fuzziness 
as a normalized distance, and others [Yager 1979; Higashi and Klir 1982] base 
their concept of a measure of fuzziness on the degree of distinction between the 
fuzzy set and its complement. 

We shall, as an illustration, discuss two of those measures. Suppose for both 
cases that the support of A is finite. 

The first is as follows: Let 4(x) be the membership function of the fuzzy set 
A for x € X, X finite. It seems plausible that the measure of fuzziness d(A) should 
then have the following properties [de Luca and Termini 1972]: 


l. d(A) = Oif Aisa crisp set in X. 
. d(A) assumes a unique maximum if 4(x) = 4IWxe X. 
3. d(A)>d(A’) if A’ is “crisper” than A, i.e., if ua (x) < wa(x) for a(x) <+ and 


ua (x) = alx) for pax) 2+3. 
4. ACA) = d(A) where CA is the complement of A. 


De Luca and Termini suggested as a measure of fuzziness the “entropy”’ of a 
fuzzy set [de Luca and Termini 1972, p. 305], which they defined as follows: 


Definition 4-3a 


The entropy as a measure of a fuzzy set A = {(x, U4(x)} is defined as 


' Also employed in thermodynamics, information theory, and statistics [Capocelli and de Luca 
1973]. 
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d(A)= H(A)+ H(A), xeX 


H(A) =—K¥ wala) Ina (x) 


where n is the number of elements in the support of A and K is a positive 
constant. 

Using Shannon’s function S(x) =—xInx — (1 —x)In(1 — x), de Luca and Termini 
simplify the expression in definition 4—3a to arrive at the following definition. 


Definition 4-3b 


The entropy d as a measure of fuzziness of a fuzzy set A= {x, ua(x)} is defined 
as 


a(A) = KY Suan). 


Example 4-2 
Let A = “Integers close to 10” (see example 2—1d) 
A={(7, .1), (8, .5), (9, 8), (10, 1), (11, .8), (12, .5), (13, .D} 
Let K = 1, so 
d(A) = .325 + .693 + .501+0+.501+ .693 + .61 1 + .325 = 3.038 

Furthermore, let B = “integers quite close to 10” 

B= {(6,.1), (7, .3), (8, .4), (9, .7), (10, 1), (11, .8), (12, .5), (13, .3), (14, .1)} 

d(B) = .325 + .611+.673+ .611+0+.501 + .693+ .611+.325 =4.35 


The second measure is as follows: Knopfmacher [1975], Loo [1977], Gottwald 
[1979b], and others based their contributions on the Luca and Termini’s sugges- 
tion in some respects. 

If A is a fuzzy set in X and (A is its complement, then in contrast to crisp sets, 
it is not necessarily true that 


AU@A=X 

ANGA=@ 
This means that fuzzy sets do not always satisfy the law of the excluded middle, 
which is one of their major distinctions from traditional crisp sets. Some authors 


[Yager 1979; Higashi and Klir 1982] consider the relationship between A and CA 
to be the essence of fuzziness. 
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Yager [1979] notes that the requirement of distinction between A and (A is 
not satisfied by fuzzy sets. He therefore suggests that any measure of fuzziness 
should be a measure of the lack of distnction between A and (A or U4(x) and 


Ugal). 


As a possible metric to measure the distance between a fuzzy set and its com- 
plement, Yager suggests: 


Definition 4—4 


n 1/p 
D,(A, CA) =| Sota) Hea | p=, 2, 3,66. 
i=] 


Let S = supp(A): D,(S, €S) = ||S||"” 


Definition 4-5 [Yager 1979] 

A measure of the fuzziness of A can be defined as 

D,(A, GA) 
llsupp(A)I 


So f(A) e [0, 1]. This measure also satisfies properties 1 to 4 required by de Luca 
and Termini (see above). 
For p = 1, D,(A, (A) yields the Hamming metric 


f(A) = = 


D, (A, ÇA) = $ lua (x;) — Wea Oxi) 
i=] 
Because H¢i(x) = 1 — a(x), this becomes 
D,(A, CA) = $ 2u; (x;) - 
i=] 
For p = 2, we arrive at the Euclidean metric 
n 1/2 
~ n7 2 
Ds(4, 04) =$ tua) -nad ) 
i=] 


and for Ug4(x) = 1 — p(x), we have 


1/2 


D 04) =$ em) | 
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Example 4-3 


Let A = “integers close to 10” and 
B = “integers quite close to 10” be defined as in example 4-2. 
Applying the above derived formula, we compute for p = 1: 


D(A, CA) =.8+0+.64+1+.6+0+.8 


= 3.8 
||supp(A)]| = 7 
~ 3.8 
SO f(A) =l1- EA = 0.457. 
Analogously, 
D, (B, CB) = 4.6 
l|supp(B)|| = 9 


~ 4.6 
SO fi(B) =l1- “9 = 0.489. 


Similarly, for p = 2, we obtain 


D, (å, ÇA) = 1.73 
|supp(A)|| = 2.65 


~ 1.73 
SO f(A) =]1— 265 = 0.347, and 
D, (B, CB) =1.78 
||supp(B)|| = 1 
1.78 


so f,(B)=1- >> 0.407. 


The reader should realize that the complement of a fuzzy set is not uniquely 
defined [see Bellman and Giertz 1973; Dubois and Prade 1982a; Lowen 1978]. 
It is therefore not surprising that for other definitions of the complement and for 
other measures of distance, other measures of fuzziness will result, even though 
they all focus on the distinction between a fuzzy set and its complement [see, for 
example, Klir 1987, p. 141]. Those variations, as well as extension of measures 
of fuzziness to nonfinite supports, will not be considered here; neither will the 
approaches that define fuzzy measures of fuzzy sets [Yager 1979]. 
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Exercises 


l. 


2. 


3. 


4. 


Let A be defined as in example 4-2. 

B’ = {(8, .5), (9, .9), (10, 1), (11, .8), (12, .5)} 

C’ = {(6, .1), (7, .1), (8, .5), (9, .8), (10, 1), (11, .8), (12, .5), (13, .1), 
(14, 1)} oo 

Is A crisper than B (or C)? 

Compute as measures of fuzziness: 

a. the entropy (with K = 1) 

b. fi 

c. f for all three sets. 

Compare the results. 

Determine the maximum of the entropy of d(A) in dependence of the cardi- 
nality of the support of A. 

Consider A as in exercise 1. Determine A N tå and A U CA. For which 
(special) fuzzy sets does the equality hold? 

Consider example 4—1. Compute the possibilities of the following sets: 


A, = {1, 2, 3, 4, 5, 6}, A, = {1, 5, 8, 9},A3= {7, 9} 


5 THE EXTENSION 
PRINCIPLE AND 
APPLICATIONS 


5.1 The Extension Principle 


One of the most basic concepts of fuzzy set theory that can be used to general- 
ize crisp mathematical concepts to fuzzy sets is the extension principle. In its ele- 
mentary form, it was already implied in Zadeh’s first contribution [1965]. In the 
meantime, modifications have been suggested [Zadeh 1973a; Zadeh et al. 1975; 
Jain 1976]. Following Zadeh [1973a] and Dubois and Prade [1980a], we define 
the extension principle as follows: 


Definition 5-1 


Lex X be a Cartesian product of universes X = X,x... xX,, and Ae. ogee A, be r 
fuzzy sets in X4, . . . , X,, respectively. f is a mapping from X to a universe Y,y= 
f(x, ..., X). Then the extension principle allows us to define a fuzzy set B in Y 
by 


B={(y,pa(y) ly = flu,..., x,),(%1,-.-5 4) € X} 


where 
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yess xref! (y) 


sup  min{pa(x),... Ma (x)} if f(y) #0 
u(y) = (x 
0 otherwise 


where f' is the inverse of f. 
For r = 1, the extension principle, of course, reduces to 


B= f(A)={(y, u Oly = f(x), x€ X} 


where 
sup a(x), if f'O)+0 
aly) = 4r) 
otherwise 
Example 5-1 
Let A = {(-1, .5), (0, .8), (1, 1), (2, .4)} 
f(x) =x" 


Then by applying the extension principle, we obtain 
B= f(A) = {(0, 8), (1, D, (4, 4)} 
Figure 5-1 illustrates the relationship. 
The extension principle as stated in definition 5—1 can and has been modified by 
using the algebraic sum (definition 3—8) rather than sup, and the product rather 


than min [Dubois and Prade 1980a]. Since, however, it is generally used as defined 
in definition 5—1, we will restrict our considerations to this “classical” version. 


5.2 Operations for Type 2 Fuzzy Sets 


The extension principle can be used to define set-theoretic operations for type 2 
fuzzy sets as defined in definition 3-1. 

We shall consider only fuzzy sets of type 2 with discrete domains. Let two 
fuzzy sets of type 2 be defined by 


A(x) = {x,u} and = B(x) = {x,u} 
where 


Wa (x) = {(u;, pui (x) x € X, ui, Hui (x) € [0, 1]} 
a(x) = {(v;, Hyj (x))lx € X, Vis Hyj (x) €[0, 1]} 
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4 4 
3 3 
2 2 
1 1 
0 0 
-1 -1 
S(A) S(B) 


Figure 5-1. The extension principle. 


The u; and v; are degrees of membership of type 1 fuzzy sets and the w(x) and 
L(x), respectively, their membership functions. Using the extension principle, the 
set-theoretic operations can be defined as follows [Mizumoto and Tanaka 1976]: 


Definition 5-2 


Let two fuzzy sets of type 2 be defined as above. The membership function of 
their union is then defined by 


Wava(x) =pa(x) Ups (x) 
= {(w, paua (w))|w = max{u;, v;}, u;, v; € [0,1]} 
where 


Wava(w)= sup min{p,,;(x), W(x} 


w=max {ur vj} 
Their intersection is defined by 


Mana (x) = a(x) Na (x) 
= {(w, Wang(w))Iw = min{u;, v;}, u;, v; €[0, 1]} 
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where 


Hana(w)= sup min{p, (x), py (x) 


w=min{u;v; } 
and the complement of A by 
Heal) = 110 —4;), wa uit 


Example 5-2 


Let X=1,..., 10, A = small integers 
B = integers close to 4 
defined by 
A ={(x, u(x} 
B= {(x, ug(x))} 
where, for x = 3, 
4 (3) = {(u;, Wai Gli =1,... , 3} 
= {(.8, 1), (.7, .5), (.6, .4)} 
Ha (3) =10;, Wy GB) =1,..-, 35 
= {(1, 1), (8, .5), (.7, .3)} 


Compute Uänŝ: 


Ui Vj w = min{u;, v;} Hui(3) U,,(3) min{}H,(3), 1,(3)} 
8 1 8 1 1 1 

8 8 8 1 5 5 

8 7 7 1 3 3 

7 1 7 5 1 5 

7 8 7 5 5 5 

7 7 7 5 3 3 

6 1 .6 4 1 4 

6 8 6 4 5 4 

6 7 6 4 3 3 


Next, compute the supremum of the degrees of membership of all pairs (u;, v;) 
that yield w as minimum: 


sup {1,.5}=1 


.8=min{u; vj} 
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sup {.3,.5,.5, 3} =.5 


J=min {u;i wj} 


sup {.4,.4,.3} =.4 


.6=min {uj vj} 
So we obtain the membership function of x = 3 as the fuzzy set 
Lang (3) = 168, 1), (7, .5), 66, 4)} 


Mizumoto and Tanaka [1976, p. 318] show that type 2 fuzzy sets as defined above 
are idempotent, commutative, and associative and satisfy the DeMorgan laws. 
They are, however, not distributive and do not satisfy the absorbtion laws, the 
identity laws, or the complement laws. 

Example 5-2 is a good indication of the computational effort involved in oper- 
ations with type 2 fuzzy sets. The reader should realize that in this example the 
degrees of membership of only one element of the type 2 fuzzy set is computed. 
For all other elements, such as x = 4, x = 5,... etc. of the sets A * B, the corre- 
sponding calculations would be necessary. Here “*” can be any set-theoretic 
operation mentioned so far. 


5.3 Algebraic Operations with Fuzzy Numbers 


Definition 5-3 
A fuzzy number Mis a convex normalized fuzzy set M of the real line R such that 


1. It exists exactly one xo € R with L(x) = 1 (xo is called the mean value of 
M). 
2. y(x) is piecewise continuous. 


Nowadays, definition 5-3 is very often modified. For the sake of computational 
efficiency and ease of data acquisition, trapezoidal membership functions are 
often used. Figure 5-2 shows such a fuzzy set, which could be called “approxi- 
mately 5” and which would normally be defined as the quadrupel {3, 4, 6, 7}. 
Strictly speaking, it is a fuzzy interval (see section 5.3.2). A triangular fuzzy 
number is, of course, a special case of this. 


Definition 5-4 


A fuzzy number M is called positive (negative) if its membership function is such 
that uy(x) = 0, Vx < 0 (Vx > 0). 
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wx) 





Figure 5-2. Trapezoidal “fuzzy number.” 


Example 5-3 
The following fuzzy sets are fuzzy numbers: 
approximately 5 = {(3, .2), (4, .6), (5, 1), (6, .7), (7, .1)} 
approximately 10 = {(8, .3), (9, .7), (10, 1), (11, .7), (12, .3)} 
But {(3, .8), (4, 1), (5, 1), (6, .7)} is not a fuzzy number because u(4) and also 
W(S) = 1. 


We are all familiar with algebraic operations with crisp numbers. If we want to 
use fuzzy sets in applications, we will have to deal with fuzzy numbers, and the 
extension principle is one way to extend algebraic operations from crisp to fuzzy 
numbers. 

We need a few more definitions: Let F(R) be the set of real fuzzy numbers 
and X = X, x X,. We can define the following properties of binary operations: 


Definition 5-5 
A binary operation * in R is called increasing (decreasing) if 


for X > yı and X2 >y 
Xi * X2 > Yi * Y2 (X1 * X2 < Yı * Yo) 
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Example 5~4 
fx, y)=xt+y is an increasing operation. 
f(x, y)=x-y is an increasing operation on R’. 
f(x, y) =-(x+ y) is a decreasing operation. 
If the normal algebraic operations +, —, -, : are extended to operations on fuzzy 


numbers, they shall be denoted by ©, ©, ©,©. 


Theorem 5-1 [See Dubois and Prade 1980a, p. 44] 


If M and N are fuzzy numbers whose membership functions are continuous and 
surjective from R to [0, 1] and * is a continuous increasing (decreasing) binary 
operation, then M® Nisa fuzzy number whose membership function is con- 
tinuous and surjective from R to [0, 1]. 

Dubois and Prade [1980a] present procedures to determine the membership 
functions Lye, on the basis of Uy and uy. 


Theorem 5-2 


If M, Ne F(R) with UxAx) and y(x) continuous membership functions, then by 
application of the extension principle for the binary operation *: R & R > R, 
the membership function of the fuzzy number M ® N is given by 


Uaa (z) = sup min{p y(x), ua (y)} 


z=x*y 


Properties of the extended operation ® 


Remark 5-1 [Dubois and Prade 1980a, p. 45] 


1. For any commutative operation *, the extended operation ® is also 
commutative. 
2. For any associative operation *, the extended operation ® is also associative. 


5.3.1 Special Extended Operations 


For unary operations f: X > Y, X = X; (see definitions 5-1), the extension 
principle reduces for all M e F(R) to 
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Lm) = sup bay) 
xef!(z) 


Example 5-5 


1. For f(x) = -x, the opposite of a fuzzy number M is given by —M = {(x, 
Uy) € X}, where y(x) = pyx). . . 

2. If f(x) = 4+, then the inverse of a fuzzy number M is given by M” = 
(Œ, LOD E X}, where ui (x) = pG). 

3. For A e R\{0} and f(x) = A-x, then the scalar multiplication of a fuzzy 
number is given by AM = {(x, Uyy(x)) x € X}, where m(x) = Wy(A- x). 


In the following, we shall apply the extension principle to binary operations. A 
generalization to n-ary operations is straightforward. 


Extended Addition. Since addition is an increasing operation according to 
theorem 5-1, we get for the extended addition © of fuzzy numbers that f(N, M) 
=N®M,N,M e F(R) is a fuzzy number—that is, N Ð M e F(R). 


Properties of ® 
©(M ® N) = (CM) @ (ON). 


Ð is commutative. 

® is associative. 

0 € Rc F(R) is the neutral element for ®, that is, M ® 0= M, VM € F(R). 
For ® there does not exist an inverse element, that is, VM € F (R)\R: M® 
(OM) #0e R. 


NAVNS 


One of the consequences [Yager 1980] is that fuzzy equations are very difficult 
to solve because the variables cannot be eliminated as usual. 


Extended Product. Multiplication is an increasing operation on R* and a 
decreasing operation on R~. Hence, according to theorem 5-1, the product of 
positive fuzzy numbers or of negative fuzzy numbers results in a positive fuzzy 
number. Let M be a positive and Na negative fuzzy number. Then OM is also 
negative and M © N= O(©M © N) results in a negative fuzzy number. 


Properties of © 
1. (QM)ON=OC(M ON). 


2. © is commutative. 
3. © is associative. 
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4. M Ol= M, 1 e R c F(R) is the neutral element for ©, that is, MO1= 
M, WM e F(R). i g 

5. For © there does not exist an inverse element, that is, VM e F(R)\R: M © 
M"' #1. 


Theorem 5-3 [for the proof, see Dubois and Prade 1980a, p. 51] 


If M is either a positive or a negative fuzzy number and N and P are both either 
positive or negative fuzzy numbers, then 


MO(N®@P)=(MON)@(M OP) 


Extended Subtraction. Subtraction is neither an increasing nor a decreasing 
operation. Therefore theorem 5-1 is not immediately applicable. The operation 
M © Ñ can, however, always be written as M © N= M © (ON). 

Applying the extension principle [Dubois and Prade 1979] yields 


Ugon(z) = sup min( g (x), ya (y)) 


z=x-y 


= sup min(Uy (x), ua (—y)) 


z=x+y 


= sup min(u a(x), u- (y)) 


z=x+y 


Thus M © Nis a fuzzy number whenever M and N are. 


Extended Division. Division is also neither an increasing nor a decreas- 
ing operation. If M and N are strictly positive fuzzy numbers, however (that 
is, uy(x) = 0 and u(x) = 0 Vx < 0), we obtain in analogy to the extended 
subtraction 


Haos (z) = sup min(p q(x), ua (y)) 


= sup min( pa (x), us(~)] 
= sup min(u y (x), Wy (y)) 


waxy 


N` is a positive fuzzy number. Hence theorem 5-1 can now be applied. The same 
is true if M and N are both strictly negative fuzzy numbers. 

Similar results can be obtained by using other than the min-max operations— 
for instance, those of definitions 3—7 through 3-11. 
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Extended operations with fuzzy numbers involve rather extensive computa- 
tions as long as no restrictions are put on the type of membership functions 
allowed. Dubois and Prade [1979] propose a general algorithm for perform- 
ing extended operations. For practical purposes, however, it will generally be 
more appropriate to resort to specific kinds of fuzzy numbers, as they are 
described in the next section. The generality is not limited considerably by 
limiting extended operations to fuzzy numbers in LR-representation or even to 
triangular fuzzy numbers [van Laarhoven and Pedrycz 1983], and the com- 
putational effort is very much decreased. The reader should also realize that 
extended operations on the basis of min-max cannot be directly applied to “fuzzy 
numbers” with discrete supports. As illustrated by example 5-6, the resulting 
fuzzy sets may no longer be convex and therefore no longer considered as fuzzy 
numbers. 


Example 5-6 


Let M = {(1, .3), (2, 1), (3, .4)} 
Ñ = {(2, .7), (3, 1), (4, .2)} 
Then 


M © N= {(2, .3), (3, .3), (4, .7), (6, 1), (8, .2), (9.4), (12, .2)} 


5.3.2 Extended Operations for LR-Representation of Fuzzy Sets 


Computational efficiency is of particular importance when using fuzzy set theory 
to solve real problems, that is, problems of realistic size. In the following, there- 
fore, we shall consider in detail the LR-representation of fuzzy sets, which 
increases computational efficiency without limiting the generality beyond ac- 
ceptable limits. 

Dubois and Prade [1979] suggest a special type of representation for fuzzy 
numbers of the following type: They call L (and R), which map R* —> [0, 1], and 
are decreasing, shape functions if L(O) = 1, L(x) < 1 for Vx > 0; L(x) > 0 for Vx 
< 1; L(1) = 0 or [L(x) > 0, Vx and L(+) = 0]. 


Definition 5-6 


A fuzzy number M is of LR-type if there exist reference functions L (for left), R 
(for right), and scalars œ > 0, B > O with 
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5 10 x 


Figure 5-3. LA-representation of fuzzy numbers. 





B 


m, called the mean value of M , is a real number, and & and ß are called the left 
and right spreads, respectively. Symbolically, M is denoted by (m, a, B)ır. (See 
figure 5-3.) 

For L(z), different functions can be chosen. Dubois and Prade [1988a, p. 50] 
mention, for instance, L(x) = max (0, 1 — x)’, L(x) = max (0, 1 — x”), with p > 0 
and L(x) = e* or L(x) = e”. These examples already give an impression of the 
wide scope of L(z). One problem, of course, is to find the appropriate function in 
a specific context. 








Example 5-7 
Let 
L(x)= 
(x) 1+x? 
l 
R(x) = 
O= 


a=2,B=3,m=5 


66 FUZZY SET THEORY—AND ITS APPLICATIONS 








Then 
(==) = l for x<5 
l a) 
1+ 5 
Mg (x) = 
x-5 1 
> ~~ 19(x—5)| for x25 
1+ | 





If the m is not a real number but an interval [m, m], then the fuzzy set M is 
not a fuzzy number but a fuzzy interval. Accordingly, a fuzzy interval in LR- 
representation can be defined as follows: 


Definition 5—6a {Dubois and Prade 1988a, p. 48] 


A fuzzy interval M is of LR-type if there exist shape functions L and R and four 
parameters (m, Mm) € R? U {-©9, +}, a, B and the membership function of M is 


(==) for x<m 
Q 


Wa (x) =51 for msx<m 








The fuzzy interval is then denoted by 
M = (m, m, Q, B) pe 


This definition is very general and allows quantification of quite different types 
of information; for instance, if M is supposed to be a real crisp number for m 
e R, 


M =(m,m,0,0),,, VL, VR 


LR? 


If M is a crisp interval, 
M = (a, b,0,0),,, VL, YR 


and if M is a “trapezoidal fuzzy number” (see definition 5-3), L(x) = R(x) = max 
(0, 1 — x) is implied. 

For LR fuzzy numbers, the computations necessary for the above-mentioned 
operations are considerably simplified: Dubois and Prade [1979] showed that 
exact formulas can be given for ® and ©. They also suggested approximate 
expressions for © and © [Dubois and Prade 1979], which approximate better 
when the spreads are smaller compared to the mean values. 
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Theorem 5-4 
Let M, N be two fuzzy numbers of LR-type: 

M =(m,q, B) p> N=(n,y, Ò) r 
Then 


1. (m, Q, B)rr p (n, Y, Ö)IR = (m + n, Q + Y, B + Ò)ir. 
2. —(m, Q, B) rr = (-—m, B, OL) rR: 
3. (m, Q, B)zr O (n, Y, Õ)IR = (m —n, Q + Ô, B + Y)LR- 


Example 5-8 


L(x) = R(x) = 





1+x? 


Theorem 5-5 [Dubois and Prade 1980a, p. 55] 
Let M , Ñ be fuzzy numbers as in definition 5-3; then 

(m, a, B); © (n, Y, Ò), = (mn, my + na, mò + nB) p 
for M , N positive; 

(m, a, B) g © (n, Y, Ò) 2 = (mn, na — mò, nB- mY) p 
for Ñ positive, M negative, and 

(m, 0, B), OM, Y, Ò) 2 = (mn, — nB — mò, na — mY) ip 
for M, N negative. 


The following example shows an application of theorem 5-5. 


Example 5-9 
Let M = (2, .2, Die 


N = (3, .1, .3)ır 
be fuzzy numbers of LR-type with reference functions 
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1 -l<z<l 
0 else 


L(z) = R(z) = 


If we are interested in the LR-representation of M © N, we prove the conditions 
of theorem 5-5 and apply it. Thus, with 














2— 
i = )xs2 
Lg (x) = l 
x—2 
r| )x22 
— —2 
Ji -1< <1 and -1< %— <] 
= .2 1 
0 else 
=d 1.9<x<2.1 
lo else 


it follows that M is positive. 





[=—*)x<3 





Ug (x) = 13 
r| )x23 
3 
=f 2.9<x<3.1 
0 else 


shows that N is positive. g g 
Following theorem 5-5 for the case in which M and N are positive, we obtain 


M ON = (2-3, 2-.14+3-.2,2-.3+3-.1),, = (6, .8, .9), z 


Exercises 


l. Le X=NxN 
A, = {(1, .6), (2, .8), (3, 1), (4, .6)} 
A, = {(0, .5), (1, .7), (2, .9), (3, 1), (4, .4)} 
f:NxN-N be defined by 


f(x, y)=z, x € Aj, yY € Å 


Determine the image f (A, x Ap) by the extension principle. 
2. Compute [jus and uq; for A, B as in example 5-2. 
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3. 


Which of the following fuzzy sets are fuzzy numbers? 
a. A= {(x, Hi) € R} 


where 
-1 


i (14554) x<5 


(PEI sas 
3 











b. B= {(x, a(x) € R*} 
where 
x xel[0,1] 


a(x) = 1 xe[l, 2] 
3-x xeE[2,3] 
c. C= {(0, .4), (1, 1), (2, .7)} 


Which of the following functions are reference functions for x e R? 
a. fi(x)=|x4+ 1| 


b. f= 





l +x? 


l 
z7*t! x €[-2, 0] 


. fal) = 4 -2x41 re[0,> 


C 


0 else 


d. falx) = -l p21 

1+alx|” g 
Let M, L(x), R(x) be defined as in example 5-8. N = (-4, .1, .6);r. Compute 
MON. 
Let M, N be defined as in example 5-8. Compute MON. i 
Develop an approximate formula to compute M © N, M = (m, a, B):r, N= 
(n, Y, O)ır. (Remember how the formula was derived for the general extended 
division.) 


6 FUZZY RELATIONS 
AND FUZZY GRAPHS 


6.1 Fuzzy Relations on Sets and Fuzzy Sets 


Fuzzy relations are fuzzy subsets of X x Y, that is, mappings from X —> Y. They 
have been studied by a number of authors, in particular by Zadeh [1965, 1971], 
Kaufmann [1975], and Rosenfeld [1975]. Applications of fuzzy relations are 
widespread and important. We shall consider some of them and point to more 
possible uses at the end of this chapter. We shall exemplarily consider only binary 
relations. A generalization to n-ary relations is straightforward. 


Definition 6-1 
Let X, Y c R be universal sets; then 


R={((x, y), wal, yx, y) © X x Y} 


is called a fuzzy relation on X x Y. 


Example 6-1 


Let X = Y=R and R: = “considerably larger than.” The membership function of 
the fuzzy relation, which is, of course, a fuzzy set on X x Y, can then be 
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0 forx <y 
uglx, y)= = y) fory<x<lly 
10y 
1 for x>l1ly 


A different membership function for this relation could be 
0 forx <y 

u Ř (x sY ) = al 
(1+ (yx) *) for x>y 


For discrete supports, fuzzy relations can also be defined by matrixes. 


Example 6-2 
Let X = {X), X2, x3} and Y = {y, yo, ys, ya} 





and 


Xi 
Z = “y very close to x”: x, 


X3 





In definition 6-1 it was assumed that ug was a mapping from X x Y to [0, 1]; that 
is, the definition assigns to each pair (x, y) a degree of membership in the unit 
interval. In some instances, such as in graph theory, it is useful to consider fuzzy 
relations that map from fuzzy sets contained in the universal sets into the unit 
interval. Then definition 6-1 has to be generalized [Rosenfeld 1975]. 
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Definition 6-2 

Let X, Y c R and 
A ={(x,pa(x))Ix € X}, 
B={(y,us(y))lyeY}, two fuzzy sets. 


Then R = {[(x, y), Rœ, YI, y) © X x Y} is a fuzzy relation on A and B if 
a(x, y) Sual), Vx, ye X XY 
and 
a(x, y) Sua), V(x, y) EX XY. 


This definition will be particularly useful when defining fuzzy graphs: Let the 
elements of the fuzzy relation of definition 6-2 be the nodes of a fuzzy graph that 
is represented by this fuzzy relation. The degrees of membership of the elements 
of the related fuzzy sets define the “strength” of or the flow in the respective 
nodes of the graph, while the degrees of membership of the corresponding pairs 
in the relation are the “flows” or “capacities” of the edges. The additional require- 
ment of definition 6—2 (ug(x, y) < min {ui(x), La(y)}) then ensures that the “flows” 
in the edges of the graph can never exceed the flows in the respective nodes. 

Fuzzy relations are obviously fuzzy sets in product spaces. Therefore set- 
theoretic and algebraic operations can be defined for them in analogy to the def- 
initions in chapters 2 and 3 by utilizing the extension principle. 


Definition 6-3 
Let R and Z be two fuzzy relations in the same product space. The union/inter- 
section of R with Z is then defined by 

Haug (x, y) = maxipr(x, y), Wz (x, yt, (x, y)EX XY 

Lanz(x, y) = min{pg(x, y), uz(x, y)}ł, (x, y)E X xY 


Example 6-3 


Let R and Z be the two fuzzy relations defined in example 6-2. The union of R 
and Z, which can be interpreted as “x considerably larger or very close to y,” is 
then given by 
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So far, “min” and “max” have been used to define intersection and union. Since 
fuzzy relations are fuzzy sets, operations can also be defined using the alterna- 
tive definitions in section 3.2. Some additional concepts, such as the projection 
and the cylindrical extension of fuzzy relations, have been shown to be useful. 


Definition 6-4 


Let R = {[(x, y), a(x, y)] | (x, y) € X x Y} be a fuzzy binary relation. The first 
projection of R is then defined as 


R” ={(x, maxpa(x, yx, y) eX x Y} 
The second projection is defined as 

R® = {(y, maxpe(x, yx, y) € X x Y} 
and the total projection as 


R™ = max max{ia(x, y(x, y)e X x Y} 


Example 6-4 


Let R be a fuzzy relation defined by the following relational matrix. The first, 
second, and total projections are then shown at the appropriate places below. 
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First projection 





[Har(x)] 
Xl 1 
R: X 1 
X3 1 
Second projection: 
[Har(x)] 
4 8 1 1 1 8 1 


Total projection 


The relation resulting from applying an operation of projection to another rela- 
tion is also called a “shadow” [Zadeh 1973a]. Let us now consider a more general 
space, namely, X = X, x... x X,; and let R, be a projection on X, x... X X;, 
where (i, .. . , iz) is a subsequence of (1,..., n). It is obvious that distinct fuzzy 
relations in the same universe can have the same projection. There must, however, 
be a uniquely defined largest relation R4; (Xi, . . . , Xn) with pg, (Xi... , X;,) for 
each projection. This largest relation is called the cylindrical extension of the pro- 
jection relation. 


Definition 6-5 


R gt © X is the largest relation in X of which the projection is R g> R qı 1S then called 
the cylindrical extension of R, and R, is the base of R,,. 


Example 6-5 


The cylindrical extension of R? (example 6-4) is 
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Definition 6—6 


Let R be a fuzzy relation on X = X; X... X X, and R, and R, be two fuzzy pro- 
jections on X; x...x X, and X, X . . . X X,, respectively, with s < r + 1 and Rz, 
Rz; their respective cylindrical extensions. __ . 

- The meet of R ı and R, is then defined as Ry, N R», and their join as R it U 
Rz. 


6.1.1 Compositions of Fuzzy Relations 


Fuzzy relations in different product spaces can be combined with each other by 
the operation “composition.” Different versions of “composition” have been sug- 
gested, which differ in their results and also with respect to their mathematical 
properties. The max-min composition has become the best known and the most 
frequently used one. However, often the so-called max-product or max-average 
compositions lead to results that are more appealing. 


Definition 6—7 


Max-min composition: Let R(x, y), (x, ype XXxXY and Ry, z), Q, e YxZ 
be two fuzzy relations. The max-min composition R, max-min R; is then the 
fuzzy set 


R ° R, ={[(x,2), max{min{Ha (x, y) We (y, zx Ee X, ye Y, ze Z} 
L1g,cg, is again the membership function of a fuzzy relation on fuzzy sets (defini- 


tion 6—2). 


A more general definition of composition is the “max-* composition.” 


Definition 6-8 
Let R, and R 2 be defined as in definition 6—7. The max-* composition of R, and 
R, is then defined as 

R, + Ry = {[(x, z), max(H A (x, y) * ua (y, Ile e X, ye Y, ze Z} 


If * is an associative operation that is monotonically nondecreasing in each argu- 
ment, then the max-* composition corresponds essentially to the max-min com- 
position. Two special cases of the max-* composition are proposed in the next 
definition. 


FUZZY RELATIONS AND FUZZY GRAPHS 71 


Definition 6-9 


[Rosenfeld 1975]: Let R, and Ry, respectively, be defined as in definition 6-7. 
The max-prod composition R, ° R, and the max-av composition R, ;, R» are then 
defined as follows: 


R °? R (x,z)= max[p g (x,y) uR (y, Dix EX, yeY,zeZ] 


R 2 R(x,z)=+- maxlpa (x, y) ba (y, Dee X, yeY,ze zZ] 
Y 


Example 6-6 


Let R,(x, y) and Rx, z) be defined by the following relational matrixes 
[Kaufmann 1975, p. 62]: 


yı 
y2 
R 2: y3 
Y4 





ys 





We shall first compute the min-max-composition R, ° R(x, z). We shall show in 

detail the determination for x = xı, z = zı and leave it to the reader to verify the 

total results shown in the matrix at the end of the detailed computations. We first 

perform the min operation in the minor brackets of definition 6-7: 
Letx=x,,Z=2Z, andy=y,i=1,...,5: 


MIN {HA (x1, y1), Wa Oi, z1)} = minį.1,.9} =.1 
min {pa (x1, y2), Wa (v2, Z1)} = min{.2, .2} = .2 
min{Wa, (x1, y3), Wa Q3, z1)} = min{O, .8} =0 
min {Wa (x1, y4), He (va, z1)} = min{1, .4} =4 
min{pWa, (x1, ys), Wa (ys, Zi) = min{.7, 0} =0 


R, ° R, (x1, z1) = ((x1, zı), U Riok (x1, Z1)) 
= (x > <I ), max{.l, 2, 0, 4, O}) = (x , zı), .4) 
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In analogy to the above computation we now determine the grades of member- 
ship for all pairs (x; z),2=1,...,3,j=1,...,4 and arrive at 





For the max-prod, we obtain 
X=X%,,2=2%,y=y,i=1,...,5: 
Ug (x, v1) Ha Cn, 21) =-1-.9 =.09 
UA (01, y2): Ma 2, 1) =.2:.2 = .04 
Lr, (01, ¥3) “Ha (ys, 1) =0-.8 =0 
Hr (1, ¥4) UR Oaz) =1:.4=4 
Ha (01, Ys) Ha (Ws, i) =-7-0=0 

Hence 


R, ° R, (x1, z1) = ((x1, z1), (U Riok (xı, 21))) 
= ((xı, zı), max{.09, .04, 0, .4, 0}) 
=((xı ; 2), 4) 


After performing the remaining computations, we obtain 





The max-av composition finally yields 


nm BR WN = 
Co 
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Hence 
Vs ` maxipg (x; ’ yi) + Le, (Yi, z,)} = VA , (1.4) = 7 





6.1.2 Properties of the Min-Max Composition 

(For proofs and more details see, for instance, Rosenfeld 1975.) 

Associativity. The max-min composition is associative, that is, 
(R; ° R))° R = R; ° (R, ° R). 


Hence R, ° R, o R, = R?, and the third power of a fuzzy relation is defined. 


Reflexivity 


Definition 6-10 


Let R be a fuzzy relation in X x X. 


1. R is called reflexive [Zadeh 1971] if 
ug(x, x)=1VxeX 


2. R is called e-reflective [Yeh 1975] if 
Upg(x,x)2zEVxEXx 
3. R is called weakly reflexive [Yeh 1975] if 


a(x, y) Spex, x) 


lva ye X. 
Lay, x) Spg(x, x) 


Example 6-7 
Let X = {x1, X2, X3, X4} and Y= {y, Y2, Y3, ya}. 
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The following relation “y is close to x” is reflexive: 





If R, and R, are reflexive fuzzy relations, then the max-min composition R, ° R, 
is also reflexive. 


Symmetry 


Definition 6-11 
A fuzzy relation R is called symmetric if R(x, y) = R(y, x) Vx, y € X. 


Definition 6-12 
A relation is called antisymmetric if for 
x#y either wUg(x, y) #ugly, x) 
or Wa (x, y) = Halo, x) =0 


[Kaufmann 1975, p. 105]. 
A relation is called perfectly antisymmetric if for x # y whenever 


\wx.yex 


a(x, y)>O then pgly,x)=O0Vx,yeXx 
[Zadeh 1971]. 


Example 6-8 


xX 
R: X2 
X3 


X4 
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R, is a perfectly antisymmetric relation, while R, is an antisymmetric, but not 
perfectly antisymmetric relation. R,isa nonsymmetric relation, that is, there exist 
x, y E€ X with Wax, y) # Ugly, x), which is not antisymmetric and therefore also 
not perfectly antisymmetric. 


One could certainly define other concepts, such as an G-antisymmetry (lug(x, y) 
— Ugly, x)| 2 a Vx, y e X). These concepts would probably be more in line with 
the basic ideas of fuzzy set theory. Since we will not need this type of definition 
for our further considerations, we will abstain from any further definition in this 
direction. 


Example 6-9 


Let X and Y be defined as in example 6-8. The following relation is then a sym- 
metric relation: 


Xj 
R(x, y): X2 
X3 


X4 





82 FUZZY SET THEORY—AND ITS APPLICATIONS 


Remark 6-1 


For max-min compositions, the following properties hold: 


l. If R, is reflexive and R, is an arbitrary fuzzy relation, then R, o R, 2 R, and 
R, o R, > R3. 


2. If R iS reflexive, then R Cc RoR. a 

3. If R, and R, are reflexive relations, SO is R,° Ro. oo oo 
4. If R; and R, are symmetric, then R, ° R is symmetric if Re R =R>,° R,. 
5. If R is symmetric, so is each power of R. 

Transitivity 


Definition 6-13 
A fuzzy relation R is called (max-min) transitive if 


RoRCR 


Example 6-10 


Let the fuzzy relation R be defined as 


Then R © R is 
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Now one can easily see that Lg.g(x, y) < ug(x, y) holds for all x, y € X. 


Remark 6-2 


Combinations of the above properties give some interesting results for max-min 
compositions: 


1. If R is Symmetric and transitive, then 1 RCX, y) < a(x, x) for all x, y € X. 
2. If R iS reflexive and transitive, then Re R= R. . 
3. If R, and R, are transitive and R, ° R, = = R, 0 R,, then R, ° R, is transitive. 


The properties mentioned in remarks 6—1 and 6-2 hold for the max-min compo- 
sition. For the max-prod composition, property 3 of remark 6-2 is also true but 
not properties 1 and 3 of remark 6-1 or property 1 of remark 6-2. For the max- 
av composition, properties 1 and 3 of remark 6-1 hold as well as properties 1 and 
3 of remark 6-2. Property 5 of remark 6-1 is true for any commutative operator. 


6.2 Fuzzy Graphs 


It was already mentioned that definitions 6—1 and 6-2 of a fuzzy relation can also 
be interpreted as defining a fuzzy graph. In order to stay in line with the termi- 
nology of traditional graph theory we shall use the following definition of a fuzzy 
graph. 


Definition 6-14 
Let E be the (crisp) set of nodes. A fuzzy graph is then defined by 
G(x;, x;) = {Ci x), We i xax, xj) E€ E x E} 


If É isa fuzzy set, a fuzzy graph would have to be defined in analogy to defini- 
tion 6-2. 


Example 6-11 


a. Let E= {A, B, C}. 
Considering only three possible degrees of membership, graphs could be 
described as shown in figure 6-1. 

b. Let E= {x;, x2, X3, x4}; then a fuzzy graph could be described as 
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Figure 6-1. Fuzzy graphs. 


G(x;, x;) = {[(x1, x2), 3], Kæ, x3), 6), (er, x1), 1), 
[(x2 ) xı), Al, [(x3, xı), 2], [(x3, X2), .5], 
[(x4, x3), -8]} 


Example 6—11a shows directed fuzzy binary graphs. Graphs can, of course, also 
be defined in higher-dimension product spaces. We shall, however, focus our 
attention on finite undirected binary graphs; that is, we shall assume in the fol- 
lowing that the fuzzy relation representing a graph is symmetric. The arcs can 
then be considered as unordered pairs of nodes. In analogy to traditional graph 
theory, fuzzy graph theoretic concepts can be defined. 


Definition 6-15 
H (x; x;) is a fuzzy subgraph of G(x; x;) if 
Hali x))S Hela, x) Vi, xj)E EXE 


H (x; X;) Spans graph G(x, x;) if the node sets of H (x;, x;) and G(x, y;) are equal, 
that is, if they differ only in their arc weights. 


Example 6-12 


Let G(x; x;) be defined as in example 6—11b. A spanning subgraph of G(x; x;) is 
then 


A(x;, x;) = {i x2), -2], K, x3), -4], [(x3, x2), 4], 
[(x4, x3), .7]} 
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Definition 6-16 


A path in a fuzzy graph G(x, x;) is a sequence of distinct nodes, Xo, X1, ... , Xn» 
such that for all (x; Xi1), UE), X1) > 0. The strength of the path is min {uU¢g@;, 
Xi1)} for all nodes contained in the path. The length of a path n > 0 is the number 
of nodes contained in the path. Each pair of nodes (x; Xi+1), UX, Xa) > 0 is called 
an edge (arc) of the graph. A path is called a cycle if x)» = x, and n 2 3. 

It would be straightforward to call the length of the shortest path between two 
nodes of the graph the distance between these nodes. This definition, however, 
has some disadvantages. It is therefore more reasonable to define the distance 
between two nodes as follows [Rosenfeld 1975, p. 58]: 


Definition 6-17 
The u-length of a path p = xo, . . . , X, is equal to 
L(p)= x o8 
at WOX, Xii) 


The u-distance d(x;, x;) between two nodes x;, x; is the smallest -length of any 
path from x; to x;, x, x; € G. 

It can then be shown [see Rosenfeld 1975, p. 88] that d(x;, x;) is a metric (in 
undirected graphs!). 


Definition 6-18 


Two nodes that are joined by a path are called connected nodes. 
Connectedness is a relation that is also transitive. 


Definition 6-19 


A fuzzy graph is a forest if it has no cycles; that is, it is an acyclic fuzzy graph. 
If the fuzzy forest is connected, it is called a tree. (A fuzzy graph that is a forest 
has to be distinguished from a fuzzy graph that is a fuzzy forest.) The latter shall 
not be discussed here [see Rosenfeld 1975, p. 92]. 


Example 6-13 


The fuzzy graphs shown in figure 6-2 are forests. The graphs shown in figure 
6-3 are not. 
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1/2 


O 





Figure 6-2. Fuzzy forests. 


1/2 


Figure 6-3. Graphs that are not forests. 


6.3 Special Fuzzy Relations 


Relations that are of particular interest to us are fuzzy relations that pertain to the 
similarity of fuzzy sets and those that order fuzzy sets. All of the relations dis- 
cussed below are reflexive, that is, ug(x, x) = 1 Vx € X [Zadeh 1971], and they 
are max-min transitive, that is, R ° R CR or a(x, z) 2 min {[We(x, y), Waly, z)} 
Vx, y, z € X. It should be noted that other kinds of transitivities have been defined 
[see Bezdek and Harris 1978]. These, however, will not be discussed here. The 
main difference between similarity relations and order relations is the property 
of symmetry or antisymmetry, respectively. 
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Definition 6-20 


A similarity relation is a fuzzy relation u,(-) that is reflexive, symmetrical, and 
max-min transitive. 


Example 6-14 


The following relation is a similarity relation [Zadeh 1971]: 


mal 
è< 
io) 





A similarity relation of a finite number of elements can also be represented by a 
similarity tree, similar to a dendogram. In this tree, each level represents an Q- 
cut (a-level set) of the similarity relation. For the above similarity relation, the 
similarity tree is shown below. The sets of elements on specific a-levels can be 
considered as similarity classes of a-level. 





{x1, X2, X3, X4, Xs, X6} Ro. 

{x1, x3 ma {x5, x5} Ros 

{X1, x3} X Xe} {X2,Xs} Ros 
1X1, X3} {x4} 1X6} {x2} {xs} R 


The properties of a similarity relation as defined in definition 6-20 are rather 
restrictive and not quite in accordance with fuzzy set thinking: Reflexitivity could 
be considered as being too restrictive and hence weakened by substituting these 
requirements by €-reflexitivity or weak reflexitivity (cf. definition 6—10). The 
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max-min transitivity can be replaced by any max-* transitivity listed in defini- 
tion 6—10 or in remark 6-1. 

We shall now turn to fuzzy order relations: As already mentioned, similarity 
relations and order relations are primarily distinguished by their degree of sym- 
metry. Roughly speaking, similarity relations are fuzzy relations that are reflex- 
ive, (max-min) transitive, and symmetrical; order relations, however, are not 
symmetrical. To be more precise, even different kinds of fuzzy order relations 
differ by their degree of symmetry. 


Definition 6-21 


A fuzzy relation that is (max-min) transitive and reflexive is called a fuzzy pre- 
order relation. 


Definition 6-22 


A fuzzy relation that is (min-max) transitive, reflexive, and antisymmetric is 
called a fuzzy order relation. If the relation is perfectly antisymmetrical, it is 
called a perfect fuzzy order relation [Kaufmann 1975, p. 113]. It is also called a 
fuzzy partial order relation [Zadeh 1971]. 


Definition 6-23 


A total fuzzy order relation [Kaufmann 1975, p. 112] or a fuzzy linear ordering 
[Dubois and Prade 1980a, p. 82; Zadeh 1971] is a fuzzy order relation such that 
Vx, y E€ X; x + y either ug(x, y) > 0 or ugly, x) > 0. 

Any &-cut of a fuzzy linear order is a crisp linear order. 


Example 6-15 
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R is a total fuzzy order relation. 

Fuzzy order relations play a very important role in models for decision making 
in fuzzy environments. We will therefore elaborate on some particularly inter- 
esting properties in the second volume, and we shall also discuss some additional 
concepts in this context. Some of the properties of the special fuzzy relations 
defined in this chapter are summarized in table 6-1. 


Table 6—1. Properties of fuzzy relations. 


Perfect 
Anti- anti- 
Reflexivity Transitivity symmetry symmetry Linearity Symmetry 


Fuzzy 
preorder x x 
Similarity 
relation x x x 
Fuzzy order 
relation x x x 
Perfect fuzzy 
order x x x 
relation 
Total 
(linear) x x x x 
fuzzy 
order 
relation 


Exercises 


1. Given an example for the membership function of the fuzzy relation R: = 
“considerable smaller than” in R x R. Restrict R to the first ten natural 
numbers and define the resulting matrix. 

2. Let the two fuzzy sets A and B be defined as 


A = {(0,.2), (1, .3), (2, 4), 3, .5)} 
B = {(0, .5), (1, 4), (2, .3), (3, -0)}- 
Is the following set a fuzzy relation on A and B? 


{((0, 0), .2), (©, 2), .2), ((2, 0), .2)} 
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Give an example of a fuzzy relation on A and B. g o 
3. Consider the following matrix defining a fuzzy relation R on A x B. 





Given the first and the second projection with Ug(x) and ugo(y) and the 
cylindrical extensions of the projection relations with Ug, and Wg. 
4. Compose the following two fuzzy relations R, and R, by using the 
= max-min composition, 
= max-prod. composition, and 
= max-av. composition. 





6. Give an example for a reflexive transitive relation and verify remark 6-2.2. 
7. Consider the following fuzzy graph G: 


FUZZY RELATIONS AND FUZZY GRAPHS 91 





Give an example for a spanning subgraph of G! 
Give all paths from x, to x, and determine their strengths and their u lengths. 
Is the above graph a forest or a tree? 

8. In example 6-2, two relations are defined without specifying for which 
numerical values of {x;}, {y;} the relations are good interpretations of the 
verbal relations. Give examples of numerical vectors for {x;} and {y;} such 
that the relations R and Z, respectively (in the matrixes), would express the 
verbal description. 


[ FUZZY ANALYSIS 


7.1 Fuzzy Functions on Fuzzy Sets 


A fuzzy function is a generalization of the concept of a classical function. A clas- 
sical function f is a mapping (correspondence) from the domain D of definition 
of the function into a space S; f(D) c S is called the range of f. Different features 
of the classical concept of a function can be considered to be fuzzy rather than 
crisp. Therefore different “degrees” of fuzzification of the classical notion of a 
function are conceivable. 


1. There can be a crisp mapping from a fuzzy set that carries along the fuzzi- 
ness of the domain and therefore generates a fuzzy set. The image of a crisp 
argument would again be crisp. 

2. The mapping itself can be fuzzy, thus blurring the image of a crisp argument. 
This we shall call a fuzzy function. These are called “fuzzifying functions” 
by Dubois and Prade [1980a, p. 106]. 

3. Ordinary functions can have fuzzy properties or be constrained by fuzzy 
constraints. 


Naturally, hybrid types can be considered. We shall focus our considerations, 
however, only on frequently used pure cases. 
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Definition 7—1 [Dubois and Prade 1980a; Negoita and Ralescu 1975] 


A classical function f: X —> Y maps from a fuzzy domain A in X into a fuzzy range 
B in Y iff 
Vx eX, ua (fœ) 25) 


Given a classical function f: X — Y and a fuzzy domain A in X, the extension 
principle (chapter 5.1) yields the fuzzy range B with the membership function 


La(y)= sup py (x) 
xef! (y) 


Hence f is a function according to definition 7-1. 


Example 7-1 


Let X be the set of temperatures, Y the possible demands for energy of house- 
holds, A the fuzzy set “low temperatures,” and B the fuzzy set “high energy 
demands.” The assignment “low temperatures” — “high energy demands” is then 
a fuzzy function, and the additional constraint in definition 7-1 means “the lower 
the temperatures, the higher the energy demands.” 

The correspondence between a fuzzy function and a fuzzy relation becomes 
even more obvious when looking at the following definition. 


Definition 7-2 
Let X and Y be universes and P (Y) the set of all fuzzy sets in Y (power set). 
f: X — P(Y) is a mapping 
fis a fuzzy function iff 
Macy) = Ugl, y), V(x, y)eXxY 


where up(x, y) is the membership function of a fuzzy relation. 


Example 7-2 


a. Let X be the set of all workers of a plant, f the daily output, and y the number 
of processed work pieces. A fuzzy function could then be 


fx) =y 
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c. X= set of all one-mile runners. 
f = possible record times. 
f(x) = {yly: achieved record times}. 


7.2 Extrema of Fuzzy Functions 


Traditionally, an extremum (maximum or minimum) of a crisp function f over a 
given domain D is attained at a precise point xo. If the function f happens to be 
the objective function of a decision model, possibly constrained by a set of other 
functions, then the point x) at which the function attains the optimum is gener- 
ally called the optimal decision; that is, in classical theory there is an almost 
unique relationship between the extremum of the objective function and the 
notion of the optimal decision of a decision model. 

In models in which fuzziness is involved, this unique relationship no longer 
exists. The extremum of a function or the optimum of a decision model can be 
interpreted in a number of ways: In decision models the “optimal decision” is 
often considered to be the crisp set, D,,, that contains those elements of the fuzzy 
set “decision” attaining the maximum degree of membership [Bellman and Zadeh 
1970, p. 150]. We shall discuss this concept in more detail in chapter 13. 

The notion of an “optimal decision” as mentioned above corresponds to the 
concept of a “maximizing set? when considering functions in general. 


Definition 7-3 [Zadeh 1972] 


Let f be a real-valued function in X. Let f be bounded from below by inf(f) and 
from above by sup(f). The fuzzy set M = {(x, Uuy(x)}, x € X with 


f(x)-inf(f) 
sup(f) —inf(/) 


is then called the maximizing set (see figure 7-1). 


Ua (x) = 


Example 7-3 


f(x) =sin x | 
sinx—inf(sin) — sinx—(-1) 


Hala) = sup(sin) — inf(sin) = 1-(-1) 
sinx+1 1. 1 
= ——- = — sm x+—- 
2 2 2 
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Figure 7—1. Maximizing set. 


In definition 7—3, f is a crisp real-valued function, similar to the membership 
function of the fuzzy set “decision,” and the maximizing set provides informa- 
tion about the neighborhood of the extremum of the function f, the domain of 
which is also crisp. The case in which the domain of f is also fuzzy will be con- 
sidered in chapter 13. 

Let us now consider the extrema of fuzzy functions according to definition 7—2, 
in which they are defined over a crisp domain: Since a fuzzy function f(x) is a fuzzy 
set, say in R, the maximum will generally not be a point in R but also a fuzzy set, 
which we shall call the “fuzzy maximum of f(x).” A straightforward approach is to 
define an extended max operation in analogy to the other extended operations 
defined in chapter 5. Max and min are increasing operations in R. The maximum 
or minimum, respectively, of n fuzzy numbers, denoted by max (M,, Lees M) and 
min (M,, Lae M), is again a fuzzy number. Dubois and Prade [1980a, p. 58] 
present rules for computing mãx and min and also comment on the properties of 
max and min. The reader is referred to the above reference for further details. 


Definition 7-4 


Let f (x) be a fuzzy function from X to R, defined over a crisp and finite domain 
D. The fuzzy maximum of f(x) is then defined as 
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IR 





Figure 7—2. A fuzzy function. 


M = max f(x) = {(sup f(x), W(x) | x € D} 


For |D| = n, the membership function of m4x f(x) is given by 


gee 


Example 7-4 [Dubois and Prade 1980a, p. 105] 


Let f(x) be a fuzzy function from R to R such that, for any x, f(x) isa triangu- 
lar fuzzy number. The domain D = {x,, x2, X3, x4, x5}. Figure 7—2 sketches such 
a function by showing for the domain D “level curves” of f(x): fı is the curve for 
which Wo (fix) = 1, and for fà and fa, respectively, 


My (fa(x)) =b7 (Fe (x)) = 0 


The triangular fuzzy numbers representing the function f (x) at x = Xi, Xo, X3, X4, 
and x; are shown in figure 7—3. 

We can make the following observation: Since the level curves in figure 7-2 
are not parallel to each other, their maxima are attained at different x;: max 
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Figure 7-3. Triangular fuzzy numbers representing a fuzzy function. 


fa=S (xs), max fi(x) = fi(x3), and max falx) =f a(x2). Thus x, and x; do certainly 
not “belong” to the maximum of f(x). We can easily determine the fuzzy set 
“maximum of f(x)” as defined in definition 7-4 by looking at figure 7-4 and 
observing that, for 


a€[0, a7]: f (x2) 2 fala) Vi 
a ela“, 1]: f (x3) fala) Vi 
a elat, 1]: f (03) > få(x) Vi 
a e[0, at]: f*(x4)2 falx) Vi 
with o and a* such that fa(x2) = fax) and falx4) = f a(x), respectively. 
The maximum of f(x) is therefore 
M = {(x2,0.7), (x3, 1), (x4, 00*)} 
This set is indicated in figure 7—4 by the dashed line. 
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Figure 7-4. The maximum of a fuzzy function. 


Dubois and Prade [1980a, p. 101] suggest additional possible interpretations 
of fuzzy extrema, which might be very appropriate in certain situations. However, 
we shall not discuss them here and rather shall proceed to consider possible 
notions of the integral of a fuzzy set or a fuzzy function. 


7.3 Integration of Fuzzy Functions 


Quite different suggestions have been made to define fuzzy integrals, integrals of 
fuzzy functions, and integrals of crisp functions over fuzzy domains or with fuzzy 
ranges. 

One of the first concepts of a fuzzy integral was put forward by Sugeno [1972, 
1977], who considered fuzzy measures and suggested a definition of a fuzzy inte- 
gral that is a generalization of Lebesque integrals: “From the viewpoint of func- 
tionals, fuzzy integrals are merely a kind of nonlinear functionals (precisely 
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speaking, monotonous functionals), while Lebesque integrals are linear ones” 
[Sugeno 1977, p. 92]. 

We shall focus our attention on approaches along the line of Riemann inte- 
grals. The main references for the following are Dubois and Prade [1980a, 
1982b], Aumann [1965], and Nguyen [1978]. 

The classical concept of integration of a real-valued function over a closed 
interval can be generalized in four ways: The function can be a fuzzy function 
that is to be integrated over a crisp interval, or it can be integrated over a fuzzy 
interval (that is, an interval with fuzzy foundations). Alternatively, we may con- 
sider integrating a fuzzy function as defined in definitions 7—1 or 7—2 over a crisp 
or a fuzzy interval. 


7.3.1 Integration of a Fuzzy Function over a Crisp Interval 


We shall now consider a fuzzy function f , according to definition 7—2, which shall 
be integrated over the crisp interval [a, b]. The fuzzy function f(x) is supposed to 
be a fuzzy number, that is, a piecewise continuous convex normalized fuzzy set on R. 

We shall further assume that the o-level curves (see definition 2.3) Ux (y) = 
a for all œ e [0, 1] and @ and x as parameters have exactly two continuous solu- 
tions, y = f(x) and y = f(x), for & + 1 and only one for a = 1. få and fù are 
defined such that 


fa) falx)2 fx) 2 fa 2 far 


for all œ > a. 
The integral of any continuous a-level curve of f over [a, b] always exists. 
One may now define the integral I (a, b) of f (x) over [a, b] as a fuzzy set in 
which the degree of membership & is assigned to the integral of any o-level curve 
of f (x) over [a, b]. 


Definition 7-5 


Let f(x) be a fuzzy function from [a, b] c R to R such that Vx e = [a, b] f (x) 
is a fuzzy number and f(x) and f3(x) are o-level curves as defined above. The 
integral of f(x) over [a, b] is then defined to be the fuzzy set 


~ b b 
i(a,b)={{f fala) de+ f fa(x) de, o)} 
This definition is consistent with the extension principle according to which 


Hy ,W= sup inf p sB), yeR 


y=fb g 
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where y = {g: [a, b] — Rlg integrable} see Dubois and Prade [1980a, p. 107; 
1982, p. 5J). 

The determination of the integral I (a, b) becomes somewhat easier if the fuzzy 
function is assumed to be of the LR type (see definition 5—6). We shall therefore 
assume that f (x) = (fœ), s(x), t(x))ır 18 a fuzzy number in LR representation for 
all x € [a, b]. f, s, and t are assumed to be positive integrable functions on [a, b]. 
Dubois and Prade [1980a, p. 109] have shown that under these conditions 


I(a, b) = | f f(x)dx, f s(x)dx, [tax] 


LR 


It is then sufficient to integrate the mean value and the spread functions of f(x) 
over [a, b], and the result will again be an LR fuzzy number. 


Example 7-5 


Consider the fuzzy function f (x) = (f(x), s(x), t(x))ır with the mean function 
f(x) =x’, the spread functions s(x) = x/4, and 








(x) => 
Lx) = 1+ x? 
RO = 1g 


4 
Determine the integral from a = 1 to b = 4, that is, compute J f. 
According to the above formula, we compute 


| f@dr=f Pdr =21 
f s(x)dx =| Zax = 1.875 
f tax =f 5 ax =3.75 


This yields the fuzzy number J(a, b) = (21, 1.875, 3.75),z as the value of the 
fuzzy integral. 


Some Properties of Integrals of Fuzzy Functions. Let A, be the a-level set 
of the fuzzy set A. The support S(A) of A is then S(A)= U A.. The fuzzy set 
A can now be written as aeto 
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A= U OAg= U {(x, Mos, (x)|x € Aa )} 
ae{0,1] oe[0,1] 
where 


a forxeA, 


Hasa (1) = f for x € Ay 


(see Nguyen [1978, p. 369]). 
Let A represent a fuzzy integral, that is, 


then 


Definition 7—6 [Dubois and Prade 1982a, p. 6] 
J f satisfies the commutativity condition 
iff Va €[0, 11([ f) = ff 


Dubois and Prade [1982a, p. 6] have proved the following properties of fuzzy 
integrals, which are partly a straightforward analogy of crisp analysis. 


Theorem 7-1 


Let f be a fuzzy function; then 
~ b ~ a ~ 
J f=) T=- 
where the fuzzy integrals are fuzzy sets with the membership functions 


Hp 0 =H Cw) Vu 


Theorem 7-2 


Let J and I’ be two adjacent intervals J = [a, b], I’ = [b, c] and a fuzzy function 
f: [a, c] — PCR). Then 
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C ~ b ~ C~ 
L =Le 
where © denotes the extended addition of fuzzy sets, which is defined in analogy 


to the subtraction of fuzzy numbers (see chapter 5). 
Let f and g be fuzzy functions. Then f ® g is pointwise defined by 


(f Baw =fWPe sw, uEeX 


(This is a straightforward application of the extension principle from chapter 5.1.) 


Theorem -7-3 


Let f and be fuzzy functions whose supports are bounded. Then 
[Geanlfofa (7.1) 
[Fea=] rola (7.2) 


iff the commutativity condition is satisfied for f f and J g. 


7.3.2 Integration of a (Crisp) Real-Valued Function over a Fuzzy 
Interval 


We now consider a case for which Dubois and Prade [1982a, p. 106] proposed a 
quite interesting solution: A fuzzy domain ¥ of the real line R is assumed to be 
bounded by two normalized convex fuzzy sets, the membership functions of 
which are ua(x) and uz(x), respectively. (See figure 7-5.) U(x) and uz(x) can be 
interpreted as the degrees (of confidence) to which x can be considered a lower 
or upper bound of F. If aand by are the lower/upper limits of the supports of a 
or b, then ap or bo are related to each other by ao = inf S(a) < sup S(b) = bo. 


Definition 7-7 


Let f be a real-valued function that is integrable in the interval J = [do, bo]; then 
according to the extension principle, the membership function of the integral 
f gf is given by 

Hiss (z) = sup minua (x), y5 O) 


x,yeJ 


z= Jif 
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ao bo 


Figure 7-5. Fuzzily bounded interval. 


Let F(x) = [ f(y) dy, c e J (F is the antiderivative of f). Then, using the exten- 
sion principle again, the membership function of F(a), d e P(R), is given by 


Ha(z)= sup alx) 


x:z=F(x) 


Proposition 7-1 [Dubois and Prade 1982b, p. 106] 
j f= FOS F@ 


where © denotes the extended subtraction of fuzzy sets. 

Proofs of proposition 7—1 and of the following propositions can be found in 
Dubois and Prade [1982b, pp. 107—109]. - 

A possible interpretation of proposition 7—1 is as follows: If a and b are 


normalized convex fuzzy sets, then I, f is the interval between “worst” and 


“best” values for different levels of confidence indicated by the respective degrees 
of membership (see also Dubois and Prade [1988a, pp. 34—36]). 


Example 7-6 
Let 
G = {(4, .8), (5, 1), (6, .4)} 
b = {(6, .7), (7, 1), (8, .2)} 
f (x) =2, x€[do, bo] =[4, 8] 


FUZZY ANALYSIS 105 


Then 
b - 
— — b 
I. f(x) dx = I, 2dx = 2x |É 


The detailed computational results are: 


(a, b) 


sm 
N 
S 


min (u,(a), Ux(b)) 


(4, 6) 
(4, 7) 
(4, 8) 
(5, 6) 
(5, 7) 
(5, 8) 
(6, 6) 
(6, 7) 
(6, 8) 


BPNODARNAWADAH 
NAPRNONN 


Hence choosing the maximum of the membership values for each value of the 
integral yields f. f= {(0, .4), (2, .7), (4, 1), ©, 8), (8, .2)}. 


Some properties of the integral discussed above are listed in propositions 7—2 
to 7—4 below. Their proofs, as well as descriptions of other approaches to “fuzzy 
integration,” can again be found in Dubois and Prade [1982a, pp. 107—108]. 


Proposition 7—2 
Let f and g be two functions f, g: I — R, integrable on J. Then 


[utacl refs 


where ® denotes the extended addition (see chapter 5). 


Example 7-7 
Let 
f(x) =2x-3 
g(x) =-2x+3 
a = {(, .8), (2, D, 3, .4)} 
b = {G,.7), (4, 1), (5, .3)} 
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So 
| fax = [x? -3x 
| 8dr =[-x + 5a], 
| F+ eax = [2x] 
In analogy to example 7-6, we obtain 
f f ={(0, 4), (2,.7), (4, -4), (6, 1), (10, .3), (12, .3)} 
J’ e = 16, 3), (4, .3), (2, -1), (0, 8), 2,7} 


Applying the formula for the extended addition according to the extension prin- 
ciple (see section 5.3) yields 


f f +f = {(—6, .3), (—4, .3), (—2, .4), (0, .7), (2, .7), (4, .1), ©, .8), 
(8, .7), (0, .3), (12, .3), (14, .3)} 
Similarly to example 7—6, we compute 
f (f + 8) = {0, .4), (2, .7), (4, 1), (6, .8), (8, .3) 


Now we can easily verify that 
b b b 
f sof 8 >Í (f +8) 


Proposition 7-3 


Ife loa R orf, ge: 1> F, 
then equality holds: 


f (f + 8) =|’ f @ fe 


Proposition 7—4 


Let D = (å, D, D’ = (a, č), and ®” = (č, b). Then the following relationships 
hold: 


J,fcJ Arlt (7.3) 
LA-LA A iff cer (7.4) 
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7.4 Fuzzy Differentiation 


In analogy to integration, differentiation can be extended to fuzzy mathematical 
structures. 

The results will, of course, depend on the type of function considered. In terms 
of section 7.1, we will focus our attention on functions that are not fuzzy them- 
selves but that only “carry” the possible fuzziness of their arguments. Differen- 
tiation of fuzzy functions is considered by Dubois and Prade [1980a, p. 116; 
1982b, p. 227]. 

Here we shall consider only differentiation of a differentiable function f: 
R c [a, b] — R at a “fuzzy point.” A “fuzzy point” Xo [Dubois and Prade 
1982b, p. 225] is a convex fuzzy subset of the real line R (see definition 2—4). 

In the following, fuzzy points will be considered for which the support is con- 
tained in the interval [a, b], that is, S@x) c [a, b]. 

Such a fuzzy point can be interpreted as the possibility distribution of a point 
x whose precise location is only approximately known. 

The uncertainty of the knowledge about the precise location of the point 
induces an uncertainty about the derivative f’(x) of a function f(x) at this point. 
The derivative might be the same for several x belonging to [a, b]. The possibil- 
ity of f'(X) is therefore defined [Zadeh 1078] to be the supremum of the values 
of the possibilities of f’(x) = t, x € [a, b]. 

The “derivative” of a real-valued function at a fuzzy point can be interpreted 
as the fuzzy set f’(X,), the membership function of which expresses the degree 
to which a specific f’(x) is the first derivative of a function f at point X. 


Definition 7-8 


The membership function of the fuzzy set “derivative of a real-valued function 
at a fuzzy point Xọ” is defined by the extension principle as 


LL 9%) Cy) = SUP Li, (x) 
xef’! (y) 


where X, is the fuzzy number that characterizes the fuzzy location. 


Example 7-8 
Let 
f@)=x 
X, = {(-1, 4), (0, 1), (L, .6)} 


be a fuzzy location. 
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Because of f'(x) x 3x’, we obtain f'(&) = {(0, 1), (3, .6)} as derivative of a 
real-valued function at the fuzzy point Xp. 


Proposition 7-5 


The extended sum © of the derivatives of two real-valued functions f and g at 
the fuzzy point Xo is defined by 


Herreg Xio) = sup H zo (x) 


xy=f'(x)+g (x) 
Hence 


f'(Xo) D g'(X) Df’ + 8’) Xo 


Proposition 7—6 {Dubois and Prade 1982b, p. 227] 


If f and g’ are continuous and both are nondecreasing or nonincreasing, then 


f'(X%) pD g'(Xo) =(f’+ g’)Xo 


Proposition 7-7 (Chain rule of differentiation) 


L (f-8)'(X) = (fg + fe’ (Xo) E [f’(Xo) © g(Xo)] D [AXo) O g'o) 
2. Iff, g,f, and g’ are continuous, f and g are both positive, and f’ and g’ are 
both nondecreasing (f, g is negative and f’, g’ is nondecreasing) then 


F- 8) (Xo) = (F(X) © ($o) B LF (Ko) © go) 


Exercises 


1. Determine the maximizing set of 


2x? -3 -2<x<2 
œ= 
5 else 


b 
2. Show that computing uf b; according to the extension principle yields the 


usual integral if f is a crisp function. 
3. Let f(x) = (fx), sx), t)r with 
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f(x) =nx 


1 
s(x) = — 
@) lx|+1 


tx) = —--_— 
o) 1+sin?x 





L(x) = - 
1+2|x| 


Maa 


Determine f(x) explicitly for x = .5, x = 1, and x = 2. Compute the integral 
I(a, b). 
4. Let fix) = 2x° +(x- 1)’, 


X = {(-1,.5), (0, .8), (1, 1), (2, .6), (3, .4)} 


Computer f’(X). Verify that proposition 7-6 holds. 
5. Let (Xo) = {(-1, 4), (0, 1), (1, .6)}, 


f)=x?4+2 g(x)=2x4+3 
Compute f’(X,). Verify that proposition 7-6 holds. 


8 UNCERTAINTY MODELING 


8.1 Application-oriented Modeling of Uncertainty 


As already mentioned in section 1.1, the type of uncertainty modeling chosen is 
entirely up to the modeler if and when a formal model is under consideration 
which does not pretend to model reality correctly. 

If, however, the modeler is faced with a real application, then he still has a 
certain freedom of choice but he is also limited by the character of the piece of 
reality he wants to model. 

The modeler of such a problem will have to decide whether he wants to 
consider uncertainty—defined in whatever way—explicitly in his model or not. 
He might, for instance, prefer to approximate the uncertain phenomenon by a 
certain (deterministic) model. Alternatively he might include as much “slack” in 
his model that he is “on the safe side” concerning uncertainty, or he might prefer 
a “wait and see” solution by waiting with a decision until in the pass of time 
uncertainty has almost disappeared. This would amount to reducing the influence 
of uncertainty by reducing its causes which, of course, have to be known in this 
case. In either of the above cases the modeler does not have to choose any 
specific method for modeling uncertainty. In the rest of the chapter we shall focus 
on those cases in which the modeler decides to model uncertainty explicitly. 
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Until the 1960s probability theories and statistics were the only methods to 
model uncertainty which has always been considered by scientists as a rather dis- 
turbing feature of some scientific statements, of systems, phenomena or even in 
philosophy. Since the 1960s additional theories have been suggested as tools 
to model uncertainty. Some of these theories or their supporters even claim to 
be the only proper tool for modeling uncertainty, even though the notion of 
uncertainty has never been defined uniquely. 

It has been defined in specific contexts—mainly formal theories—but then the 
semantic interpretation is generally restricted to this field. In decision logic, for 
instance, “decisions under uncertainty” are defined as acts of choice for which 
the state of the nature that will occur is unknown. Unluckily, as Schneider already 
observed in 1979, those situations occur in practice very seldomly, if at all 
[Schneider 1979]. 

One would expect to find an appropriate definition of uncertainty either in 
lexica or in scholarly books on “uncertainty” modeling [Goodman and Nguyen 
1985, Klir and Folger 1988, Klir 1987]. Surprisingly enough I have not been 
successful to find any general definition for it. 

The first question one should probably ask is whether uncertainty is a 
phenomenon, a feature of real world systems, a state of mind or a label for a 
situation in which a human being wants to make statements about phenomena 
(i.e. reality, models, theories). One can also ask whether “uncertainty” is an 
objective fact or just a subjective impression which is closely related to 
individual persons. 

Whether uncertainty is an objective feature of physical real systems seems to 
be a philosophical question. In the following we shall not consider these “objec- 
tive uncertainties” if they exist, but we shall focus on the human-related, sub- 
jective interpretation of “uncertainty” which depends on the quantity and quality 
of information which is available to a human being about a system or its 
behavior that the human being wants to describe, predict or prescribe. 

In this respect it shall not matter whether the information is inadequate due 
to the specific individuum or whether it is due to the present state of knowledge, 
i.e. whether the information is not available at present to anybody. Figure 8—1 
depicts our view of uncertainty used in this chapter. 

In this figure the “system” denotes the phenomenon about which judgments 
are to be made. This can be parts of the physical reality, socio-economic systems, 
man-made systems or any other type of phenomena. Information or data emitted 
by the system might be impulses, visible or measurable properties (noise, tem- 
perature etc.). Theses data or information are, however, very often not consid- 
ered directly by the “observer”. They are rather the input to an uncertainty theory 
(e.g. probability theory), which processes this information in specified ways and 
supplies the observer with certain “measures of uncertainty” (e.g. mean values, 
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Informatiog, C) 


(human) 
observer 





Phenomenon Perception 


Figure 8—1. Uncertainty as situational property. 


variances etc.) or descriptions of uncertainty (e.g. probability distributions etc.). 
Hence, the observer does not perceive the information about the phenomenon 
directly but only after it has been “filtered” by the uncertainty theory used. 

The most important aspects of this view are: 


1. “Causes” of uncertainty influence the information flow between the observed 
system and the uncertainty model (paradigm chosen by the observer). 

2. A selected uncertainty model or theory has to be appropriate to the available 
quantity and quality of input information. 

3. A chosen uncertainty theory also determines the type of information 
processing applied to available data or information. 

4. For pragmatic reasons the information offered to the observer (human or 
other) by the uncertainty model should be in an adequate language. 

5. Hence, the choice of an appropriate “uncertainty” calculus may depend on 
e the causes of uncertainty, 
e quantity and quality of information available, 
e type of information processing required by the respective “uncertainty” 

calculus and 

e language required by the final observer. 


Even this notion of uncertainty is rather vague, has many different appearances 
and many different causes. It is, therefore, difficult to define it properly and in 
sufficient generality. Any definition of uncertainty is in a way arbitrary and 
subjective. It can be more or less extreme with respect to the situation. Here we 
chose a rather broad definition for uncertainty in order to include a large number 
of possible situations which can be considered “uncertain”. 
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Definition 8-1: A proposed definition of uncertainty 


Uncertainty implies that in a certain situation a person does not dispose about 
information which quantitatively and qualitatively is appropriate to describe, 
prescribe or predict deterministically and numerically a system, its behavior or 
other characteristica. 

“Situation” in the context of this definition includes features of the system as 
well as expectations or needs of the observer. The need to describe a phenome- 
non numerically was included because most of the known measures of uncer- 
tainty require a numerical description. In some situations a symbolic description 
of the phenomenon may be sufficient for the human observer to judge the situa- 
tion (e.g. the color of the traffic lights at a road intersection). But in this case he 
knows in addition to the color the meaning of the color and he will not be in a 
position to make statements about the traffic behavior at an intersection without 
involving numbers. 

It seems that a lot of misunderstandings have been caused by confusing 
the “type of uncertainty” with the “cause of uncertainty” or with the theory 
which is used to model uncertainty. I shall, therefore, attempt to describe in the 
following these three aspects of uncertainty separately in order to arrive at a 
certain taxonomy of uncertainty, the classes of which may neither be disjunct 
nor exhaustive. 


8.1.1 Causes of Uncertainty 


Lack of Information. Lack of information is probably the most frequent cause 
for uncertainty. In decision logic, for instance, one calls “decisions under uncer- 
tainty” the situation in which a decision maker does not have any information 
about which of the possible states of nature will occur. This would obviously be 
a quantitative lack of information. With “decision making under risk” one nor- 
mally describes a situation in which the decision maker knows the probabilities 
for the occurrence of various states. This could be called a qualitative lack of 
information. Since information about the occurrence is available, it can also be 
considered complete in the sense of the availability of a complete probability 
function. But the kind of the available information is not sufficient to describe 
the situation deterministically. Another situation characterized by a lack of infor- 
mation might be called “approximation”. Here one does not have or one does not 
want to gather sufficient information to make an exact description, even though 
this might be possible. In some cases the description of the system is explicitly 
called an “approximation”, in other situations this is hidden and probably not 


UNCERTAINTY MODELING 115 


visible to the normal observer. Examples for the latter case can be found in math- 
ematics where symbols are used rather than real numbers because a description 
by real numbers is not feasible (for instance the “number” 7, sin and cosine 
functions, or any complex or transcendental numbers). In this context the scale 
level on which numerical information is available also has to be considered. The 
situation of “certainty” normally assumes an absolute or at least a cardinal scale 
level of the information available. If only information on a ratio, ordinal or 
nominal scale level is available, this would also be called a “qualitative lack of 
information” in our view. 

A transition from a situation of uncertainty caused by a lack of information to 
a Situation of certainty can obviously only be achieved by gathering more or better 
information. Whether this is possible or desirable obviously depends on the 
situation and the goal of modeling. 


Abundance of Information (Complexity). This type of uncertainty is due to 
the limited ability of human beings to perceive and process simultaneously large 
amounts of data [Newell and Simon 1972]. This situation is exemplified by real 
world situations in which more data is objectively available to human beings than 
they can “digest” or by situations in which human beings communicate about 
phenomena which are defined or described by a large number of features or prop- 
erties. What people do in these situations is normally, that they transform the 
available data into perceivable information by using a coarser grid or a rougher 
“granularity” or by focusing their attention on those features which seem to them 
most important and neglecting all other information or data. If such a situation 
occurs in scientific activities, very often some kind of “scaling” is used to the 
same end. It is obvious that in these situations a transfer to “certainty” cannot be 
achieved by gathering even more data, but rather by transforming available data 
to appropriate information. 


Conflicting Evidence. Uncertainty might also be due to conflicting evidence, 
1.e. there might be considerable information available pointing to a certain behav- 
ior of a system and additionally there might also be information available point- 
ing to another behavior of the system. If the two classes of available information 
are conflicting, then an increase of information might not reduce uncertainty at 
all, but rather increase the conflict. The reason for this conflict of evidence can 
certainly be different. It can be due to the fact that some of the information avail- 
able is wrong (but not identifiable as wrong information by the system), it can 
also be that information of non-relevant features of the system is being used, 
it might be that the model which the observer has of the system is wrong etc. 
In this case a transition to a situation of certainty might call for checking the 
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available information again with respect to the correctness rather than gathering 
more information or putting the information on a rougher grid. In some cases, 
however, deleting some pieces of information might reduce the conflict and move 
the situation closer in the direction of certainty. 


Ambiguity. By ambiguity we mean a situation in which certain linguistic 
information, for instance, has entirely different meanings or in which—mathe- 
matically speaking—we have a one-to-many mapping. All languages contain 
certain words which for several reasons have different meanings in different 
contexts. A human observer can normally easily interpret the word correctly 
semantically if he knows the context of the word. In so far this type of uncer- 
tainty could also be classified under “lack of information” because in this case 
adding more information about the context to the word may move us from 
uncertainty to certainty. 


Measurement. The term “measurement” also has very different interpreta- 
tions in different areas [Zimmermann and Zysno 1980]. In the context of this 
chapter we mean “measurement” in the sense of “engineering measurement’, i.e. 
of measuring devices to measure physical features, such as weight, temperature, 
length etc. 

The quality of our measuring technology has increased with time and the 
further this technology improves, the more exactly it can determine properties of 
physical systems. As long, however, as an “imagined” exact property cannot yet 
be measured perfectly, we have some uncertainty about the real measure and we 
only know the indicated measure. This is certainly also some type of uncertainty 
which could also be considered as a “lack of information’. It is only considered 
to be a separate class in this paper due to the particular importance of this type 
of uncertainty to engineering. 


Belief. Eventually, we would like to mention as cause of uncertainty situations 
in which all information available to the observer is subjective as a kind of belief 
in a certain situation. This situation is probably most disputable and it could also 
be considered as “lack of information” in the objective sense. 

A possible interpretation of this situation is, however, also that a human being 
develops on the basis of available (objective) data and in a way which is unknown 
to us (subjective) beliefs which he afterwards considers as information about a 
system that he wants to describe or prescribe. The distinction of this class from 
the classes mentioned above is actually that, so far, we always have considered 
“objective” information and now we are moving to “subjective” information. 
Whether this distinction can and should be upheld at all is a matter for 
further discussion. 
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8.1.2 Type of Available Information 


So far we have discussed causes of uncertainty which in most cases depend on 
the quality or quantity of available information. As already mentioned, however, 
we will have to consider the type of available information in a situation which 
we want to judge with respect to uncertainty in more detail: the information 
which is available for a system under consideration can, roughly speaking, be 
numerical, linguistic, interval-valued or symbolic. 


Numerical Information. In our definition of certainty we requested that a 
system can be described numerically. This normally requires that the information 
about the system is also available numerically. Since this numerical information 
can come from quite a variety of sources, it is not sufficient to require just that 
the information is given in numbers, but we also have to determine the scale level 
on which this information is provided [Sneath and Sokal 1973]. This determines 
the type of information processing (mathematical operation) which we can apply 
to this information legitimately without pretending information which is not 
available. There is quite a number of taxonomies for scale levels, such as, for 
instance, distinguishing between nominal scale level, ordinal scale level, ratio 
scale level, interval scale level and absolute scale level. For our purposes we refer 
the reader to table 16-1. 

Roughly speaking, a nominal scale level indicates that the information pro- 
vided (even though in numerical form) only has the function of a name (such as 
the number on the back of a football player or a license plate of a car), that numer- 
ical information on an ordinal scale level provides information of an ordering 
type and information on a cardinal scale level also indicates information about 
the differences between the ordered quantities, i.e. contains a metric. 


Interval-Information. In this case information is available, but not as precise 
in the sense of a real-valued number as above. If we want to process this infor- 
mation properly, we will have to use interval arithmetic and the outcome will 
again be interval-valued information. It should be clear, however, that this infor- 
mation is also “exact” or “dichotomous” in the sense that the boundaries of the 
intervals, no matter how they have been determined, are “crisp”, “dichotomous”, 
or “exact”. 


Linguistic Information. By linguistic information we mean that the informa- 
tion provided is given in a natural language and not in a formal language [Bellman 
and Zadeh 1970]. The properties of the type of information obviously differ from 
those of either numerical information or of information in a formal language. 
Natural languages develop over time, they depend on cultural backgrounds, they 
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depend on educational backgrounds of the persons using this language and on 
many other things. One also has to distinguish between a word as a label and the 
meaning of a word. Very often there is neither a one-to-one relationship between 
these two nor are the meanings of words defined in a crisp and a context- 
independent way. By contrast to numerical information there are also hardly any 
measures of quality of information for natural languages (e.g. there are no defined 
scale levels for linguistic information). Linguistic information has developed as 
a means of communication between human beings and the “inference engines” 
are the minds of people about which is still much too little known. 


Symbolic Information. Very often information is provided in the form of 
symbols. This is obvious when numbers, letters or pictures are being used as 
symbols. This is often not as obvious if words are being used as symbols because 
sometimes it seems to be suggested or assumed that words have natural meanings 
while symbols do not. Hence, if symbolic information is provided, the information 
is as valuable as the definitions of the symbols are and the type of information 
processing also has to be symbolic and neither numerical nor linguistic. 


8.1.3 Uncertainty Methods 


As depicted in figure 8—1, information of the uncertain phenomenon is filtered 
by an uncertainty method before it is offered to the observer. By “uncertainty 
methods” we mean any of the probability theories, fuzzy set theory, rough set 
theory, evidence theory etc. These theories build on certain axioms with respect 
to the uncertainty to be modeled and they propose generally a mathematical 
framework to arrive at measures of uncertainty [Dubois and Prade 1989]. The 
mathematical models or methods suggested require a certain scale level of numer- 
ical information. Hence, a specific uncertainty method should not be used if its 
mathematical operations require a higher scale level than that on which the 
available information is provided. This is very often neglected when applying 
those theories. Rather one assumes, without checking, that numerical informa- 
tion is available on a cardinal or absolute scale level for which all mathematical 
operations would be legitimate. 

To an increasing degree, moreover, uncertain information or information about 
“uncertainties” is also processed in knowledge-based systems [Zimmermann 
1988, Kandel and Langholz 1992, Klein and Methlie 1995, Turban 1988] which 
can either be systems which essentially perform symbol processing (classical 
expert system technology) or they perform meaning preserving inference. 
Obviously, for these systems different requirements exist and different types of 
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information are offered at the end. Eventually, information can be processed 
heuristically, i.e. according to well-defined procedures which can also require 
other types of languages. 

To model, i.e. describe, prescribe or predict, a system or the behavior of a 
system normally serves a certain purpose. It could serve a human observer, it 
could be the input to another mechanical or electronic system, it could be used 
for other mathematical algorithms etc. In figure 8—1 a human observer was con- 
sidered as the recipient of the information. In this case the information does not 
only have to be “readable” by the recipient, but it may have to meet additional 
requirements, depending on what it is intended for. If the observer wants to rec- 
ognize certain patterns, a nominal scale level of the received information might 
already be sufficient. If he wants to evaluate or order phenomena, information 
will have to be at least on an ordinal scale level, etc. Hence, the information 
about the uncertain system will have to be provided in a suitable language, i.e. 
either numerical, in the form of intervals, linguistically or symbolically, and on 
an appropriate scale level. 


8.1.4 Uncertainty Theories as Transformers of Information 


Sections 8.1.1 to 8.1.3 of this chapter focused on informational features of 
the uncertain phenomenon. The uncertainty calculus, theory or method used to 
describe this phenomenon should obviously be compatible with the features of 
the phenomenon, i.e. not require information on a higher level than provided, not 
make any axiomatic assumptions about the cause of uncertainty etc. which are 
not satisfied by the real situation. 

This certainly contradicts views that, for instance, any uncertainty can be 
modeled by probabilities, or by fuzzy sets, or by possibilities, or by any other 
single method. We do not believe that there exists any single method which is 
able to model all types of uncertainty equally well. 

Most of the established theories and methods for uncertainty modeling are 
focused either on specific “types of uncertainty” defined by their causes or they 
at least imply certain causes and they also require specific types or qualities of 
information depending on the type of information processing they use. One could 
consider these uncertainty methods and their paradigms as glasses through which 
we consider uncertain situations or with other words: there is no “probabilistic. 
uncertainty” as distinct from “possibilistic uncertainty”. One is rather looking at 
an uncertain situation with the properties that were specified before and one tries 
to model this uncertain situation by means of probability theory or by means of 
possibility theory. Hence, the theory which is appropriate to model a specific 
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uncertainty situation should be determined by the properties of this situation as 
specified above and by the requirements of the observer. At present there exist 
numerous uncertainty theories, such as: various probability theories, evidence 
theory [Shafer 1976], possibility theory [Dubois and Prade 1988], fuzzy set 
theory, grey set theory, intuitonistic set theory [Atanassov 1986], rough set theory 
[Pawlak 1985], interval arithmetic, convex modeling [Ben-Haim and Elishakoff 
1990], etc. Some of these theories are contained in other theories which shall not 
be investigated here. 

We would like to point to one fact, however, which is sometimes overlooked: 
uncertainty theories are often not homogeneous with respect to their information 
processing or requirements as to the quality of information. Fuzzy set theory, for 
instance, claims to process linguistic information. The formal presentation of 
this information can be quite different. If singletons are used, this corresponds to 
symbol processing. If linguistic variables are used, the membership functions of 
the terms are processed. They can be on various scale levels and will, therefore, 
determine which operators, 1.e. mathematical operations, may be used and 
which not. 

Whether an uncertainty theory uses mathematical, heuristic or knowl- 
edge-based information processing or inference will also influence the type of 
required input information and the quality of the information offered to the 
observer. 


8.1.5 Matching Uncertainty Theory and Uncertain Phenomena 


Considering uncertainty as an informational feature of a situation or a phenom- 
enon, it can be described by a 4-component vector. In this vector the four 
components describe the four dimensions which are roughly sketched in 
table 8—1. 

Essentially each uncertainty theory can also be characterized by such a vector 
or profile. Optimally the profile of the theory should match the profile of the 
situation it is applied to. 

For the most common frequentistic probability theory (Kolmogoroff) it is 
rather simple to define its profile, which is: 


fa; a; c; a}. 


In addition, some other properties, i.e. that the events have to be dichotomous 
etc., have to be assumed. For other probability theories it is already more diffi- 
cult to determine an appropriate profile. For Fuzzy set theory the profile vector 
will certainly depend on the operators used, on the type of membership function 
assumed, on the scale level of the membership function etc. Or, putting it the 
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Table 8—1. Rough taxonomy of uncertainty properties. 
Rough taxonomy of uncertainty models (not exhaustive, not disjunct). 


I. Causes of (subj.) uncertainty 3. Scale Level of Numerical Information 


(a) Lack of information (a) Nominal 
(b) Abundance of information (b) Ordinal 
(c) Conflicting evidence (c) Cardinal 


(d) Ambiguity (complexity) 


(e) Measurement 4. Required Information (Output) 


(a) Numerical 


(f) Belief (b) Set- or interval-valued 
2. Available Information (Input) (c) Linguistic 
(a) Numerical (d) Symbolic 


(b) Set- or interval-valued 
(c) Linguistic 
(d) Symbolic 


other way around, after the “uncertainty profile” of the uncertain situation has 
been determined that version of fuzzy set theory that matches the profile of the 
situation has to be found. 

In the following we shall compare to a certain degree three formal theories 
that have been developed either to model uncertainty (e.g. probability) or which 
are recommended amongst other goals for uncertainty modeling: probability 
theory, possibility theory and fuzzy set theory. We will also consider some 
“hybrid” notions, i.e. terms in which two (formal) theories have been combined. 
Since L. Zadeh proposed the concept of a fuzzy set in 1965, the relationships 
between probability theory and fuzzy set theory have been further discussed. Both 
theories seem to be similar in the sense that both are concerned with some type 
of uncertainty and both use the [0, 1] interval for their measures as the range of 
their respective functions (At least as long as one considers normalized fuzzy sets 
only!). Other uncertainty measures, which were already mentioned in chapter 4, 
also focus on uncertainty and could therefore be included in such a discussion. 
The comparison between probability theory and fuzzy set theory is difficult 
primarily for two reasons: 


1. The comparison could be made on very different levels, that is, 
mathematically, semantically, linguistically, and so on. 

2. Fuzzy set theory is not or is no longer a uniquely defined mathematical struc- 
ture, such as Boolean algebra or dual logic. It is rather a very general family 
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of theories (consider, for instance, all the possible operations defined in 
chapter 3 or the different types of membership functions). In this respect, 
fuzzy set theory could rather be compared with the different existing 
theories of multivalued logic. 


Further, there does not yet exist and probably never will exist a unique context- 
independent definition of what fuzziness really means. On the other hand, neither 
is probability theory uniquely defined. There are different definitions and 
different linguistic appearances of “probability.” 

In recent years, some specific interpretations of fuzzy set theory have been 
suggested. One of them, possibility theory, used to correspond, roughly speak- 
ing, to the min-max version of fuzzy set theory—that is, to fuzzy set theory in 
which the intersection is modeled by the min-operator and the union by the max- 
operator. This interpretation of possibility theory, however, is no longer correct. 
Rather, it has been developed into a well-founded and comprehensive theory. 
After the basic articles by L. Zadeh [1978, 1981], most of the advances in 
possibility theory have been due to Dubois and Prade. See, for instance, their 
excellent book on this topic [Dubois and Prade 1988]. 

We shall first describe the essentials of possibility theory and then compare it 
with other theories of uncertainty. 


8.2 Possibility Theory 
8.2.1 Fuzzy Sets and Possibility Distributions 


Possibility theory focuses primarily on imprecision, which is intrinsic in natural 
languages and is assumed to be “possibilistic” rather than probabilistic. There- 
fore the term variable is very often used in a more linguistic sense than in a 
strictly mathematical one. This is one reason why the terminology and the sym- 
bolism of possibility theory differ in some respects from those of fuzzy set theory. 
In order to facilitate the study of possibility theory, we will therefore use the 
common possibilistic terminology but will always show the correspondence to 
fuzzy set theory. 

Suppose, for instance, we want to consider the proposition “X is F,” where X 
is the name of an object, a variable, or a proposition. For instance, in “X is a small 
integer,” X is the name of a variable. In “John is young,” John is the name of an 
object. F (i.e., “small integer” or “young”) is a fuzzy set characterized by its mem- 
bership function uF. 

One of the central concepts of possibility theory is that of a possibility distri- 
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bution (as opposed to a probability distribution). In order to define a possibility 
distribution, it is convenient first to introduce the notion of a fuzzy restriction. To 
visualize a fuzzy restriction, the reader should imagine an elastic suitcase that 
acts on the possible volume of its contents as a constraint. For a hardcover suit- 
case, the volume is a crisp number. For a soft valise, the volume of its contents 
depends to a certain degree on the strength that is used to stretch it. The variable 
in this case would be the volume of the valise; the values this variable can assume 
may be u € U, and the degree to which the variable (X) can assume different 
values of u is expressed by up(u). Zadeh [Zadeh et al. 1975, p. 2; Zadeh 1978, 
p. 5] defines these relationships as follows. 


Definition 8-2 


Let F be a fuzzy set of the universe U characterized by a membership function 
ug(u). Fisa fuzzy restriction on the variable X if F acts as an elastic constraint 
on the values that may be assigned to X, in the sense that the assignment of the 
values u to X has the form 


X =u: ru) 


ug(u) is the degree to which the constraint represented by F is satisfied when u 
is assigned to X. Equivalently, this implies that 1 — uz(u) is the degree to which 
the constraint has to be stretched in order to allow the assignment of the values 
u to the variable X. 

Whether a fuzzy set can be considered as a fuzzy restriction or not obviously 
depends on its interpretation: This is only the case if it acts as a constraint on 
the values of a variable, which might take the form of a linguistic term or a 
classical variable. 

Let R(X) be a fuzzy restriction associated with X, as defined in definition 8-1. 
Then R(X) = F is called a relational assignment equation, which assigns the fuzzy 
set F to the fuzzy restriction R(X). 

Let us now assume that A(X) is an implied attribute of the variable X—for 
instance, A(X) = “age of Jack,” and F is the fuzzy set “young.” The proposition 
“Jack is young” (or better “the age of Jack is young”) can then be expressed as 


R(A(X)) = F 


Example 8-1 [Zadeh 1978, p. 5] 


Let p be the proposition “John is young,” in which “young” is a fuzzy set of the 
universe U = [0, 100] characterized by the membership function 
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Myoung (4) = S(u; 20, 30, 40) 
where u is the numerical age and the S-function is defined by 


1 for u<@ 
2 


1-7 2) for a<u<B 
S(u; a, B, y) = roe 
t) for B<usy 


0 for u>y 








y-a 


In this case, the implied attribute A(X) is Age (John), and the translation of “John 
is young” has the form 


John is young > R(Age(John)) = young 


Zadeh [1978] related the concept of a fuzzy restriction to that of a possibility 
distribution as follows: 


Consider a numerical age, say u = 28, whose grade of membership in the fuzzy set 
“young” is approximately 0.7. First we interpret 0.7 as the degree of compatibility of 
28 with the concept labelled young. Then we postulate that the proposition “John is 
young” converts the meaning of 0.7 from the degree of compatibility of 28 with young 
to the degree of possibility that John is 28 given the proposition “John is young.” In 
short, the compatibility of a value of u with young becomes converted into the possi- 
bility of that value of u given “John is young” [Zadeh 1978, p. 6]. 


The concept of a possibility distribution can now be defined as follows: 


Definition 8-3 [Zadeh 1978, p. 6] 


Let F be a fuzzy set in a universe of discourse U that is characterized by its mem- 
bership function u(u), which is interpreted as the compatibility of u e U with 
the concept labeled F. 

Let X be a variable taking values in U, and let F act as a fuzzy restriction, 
R(X), associated with X. Then the proposition “X is F” which translates into 
R(X) = F, associates a possibility distribution, 1,, with X that is postulated to be 
equal to R(X). 

The possibility distribution function, 1,(u), characterizing the possibility 
distribution 7, is defined to be numerically equal to the membership function 
ug(u) of F, that is, 
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T, £ MF 


The symbol 4 will always stand for “denotes” or “is defined to be.” In order 
to stay in line with the common symbol of possibility theory, we will denote 
a possibility distribution with m, rather than with T,, even though it is a fuzzy 
set. 


Example 8-2 [Zadeh 1978, p. 7] 


Let U be the universe of positive integers, and let F be the fuzzy set of small 
integers defined by 


F = {(1, 1), (2, 1), (3, .8), (4, .6), (5, .4), (6, .2)} 


Then the proposition “X is a small integer” associates with X the possibility 
distribution 

m, = F 
in which a term such as (3, .8) signifies that the possibility that X is 3, given that 
X is small integer, is .8. 

Even though definition 8—3 does not assert that our intuition of what we mean 
by possibility agrees with the min-max fuzzy set theory, it might help to realize 
their common origin. It might also make more obvious the difference between 
possibility distribution and probability distribution. 

Zadeh [1978, p. 8] illustrates this difference by a simple but impressive 
example. 


Example 8-3 


Consider the statement “Hans ate X eggs for breakfast,” X = {1, 2,...}. A pos- 
sibility distribution as well as a probability distribution may be associated with 
X. The possibility distribution 7,(u) can be interpreted as the degree of ease with 
which Hans can eat u eggs while the probability distribution might have been 
determined by observing Hans at breakfast for 100 days. The values of 1,(u) and 
P (u) might be as shown in the following table: 
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We observe that a high degree of possibility does not imply a high degree of prob- 
ability. If, however, an event is not possible, it is also improbable. Thus, in a way, 
the possibility is an upper bound for the probability. A more detailed discussion 
of this “possibility/probability consistency principle” can be found in Zadeh 
[1978]. 

This principle is not intended as a crisp principle, from which exact probabil- 
ities or possibilities can be computed, but rather as a heuristic principle, express- 
ing the principle relationship between possibilities and probabilities. 


8.2.2 Possibility and Necessity Measures 


In chapter 4, a possibility measure was already defined (definition 4—2) for the 
case in which A is a crisp set. If A is a fuzzy set, a more general definition of a 
possibility measure has to be given [Zadeh 1978, p. 9]. 


Definition 8-4 


Let A be a fuzzy set in the universe U, and let m, be a possibility distribution 
associated with a variable X that takes values in U. The possibility measure, T,(Â), 
of A is then defined by 


poss{X is A} 4 n(A) 
& sup min{p; (u), 1,(u)} 


ueU 


Example 8-4 [Zadeh 1978] 


Let us consider the possibility distribution induced by the proposition “X is a 
small integer” (see example 8-2): 


T. = {(1, 1), (2, 1), (3, .8), (4, .6), (5, .4), (6, .2)} 


and the crisp set A = {3, 4, 5}. 
The possibility measure 71(A) is then 


n(A) = max (.8, .6, .4) =.8 


If A, on the other hand, is assumed to be the fuzzy set “integers which are not 
small,” defined as 


A = {(3, .2),(4, .4), (5, .6), (6, .8), (7, 1), ...} 
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then the possibility measure of “X is not a small integer” is 
poss(X is not a small integer) = max{.2, .4, .4,.2} =.4 


Similar to probability theory, conditional possibilities also exist. Such a con- 
ditional possibility distribution can be defined as follows [Zadeh 1981b, p. 81]. 


Definition 8-5 


Let X and Y be variables in the universes U and V, respectively. The conditional 
possibility distribution of X given Y is then induced by a proposition of the form 
“If X is F, then Y is G” and is denoted by tyx)(v/u). 


Proposition 8—1 


Let Myx) be the conditional possibility distribution functions of X and Y, 
respectively. The joint possibility distribution function of X and Y, ty, is then 
given by 


Tix n (u, v) = min{ry (u), Tx (v/u)} 


Not quite settled yet seems to be the question of how to derive the conditional 
possibility distribution functions from the joint possibility distribution function. 
Different views on this question are presented by Zadeh [1981b, p. 82], Hisdal 
[1978], and Nguyen [1978]. 

Fuzzy measures as defined in definition 4-2 express the degree to which a 
certain subset of a universe, Q, or an event is possible. Hence, we have 


2(0)=0 and g(Q)=1 


As a consequence of condition 2 of definition 4—2, that is, 


AC B= g(A) < g(B) 
we have 
g(A U B)= max(g(A), g(B)) and (8.1) 
g(AN B)= min(g(A), 2(B)) for A,BCQ (8.2) 


Possibility measures (definition 4—2) are defined for the limiting cases: 
n(A U B) = max (n(A), n(B)) (8.3) 
n(A N B) = min (n(A), n(B)) (8.4) 
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Table 8-2. Possibility functions. 


Student 





If CA is the complement of A in Q, then 
n(A U (A) = max(n(A), m(CA)) = 1 (8.5) 


which expresses the fact that either A or (A is completely possible. 
In possibility theory, an additional measure is defined that uses the conjunc- 
tive relationship and, in a sense, is dual to the possibility measure: 


N(AN B)= min(N(A), N(B)) (8.6) 


N is called then necessity measure. N(A) = 1 indicates that A is necessarily true 
(A is sure). The dual relationship of possibility and necessity requires that 


n(A) = 1 - NCA); VA cQ (8.7) 
Necessity measures satisfy the condition 
min(N(A), N(CA)) = 0 (8.8) 


The relationships between possibility measures and necessity measures satisfy 
the following conditions [Dobois and Prade 1988, p. 10]: 


™m(A)>N(A), VAGCQ (8.9) 
N(A)>0=> m(A) =1 
™(A)<1= N(A)=0 (8.10) 


Here Q is always assumed to be finite. 


Example 8-5 


Let us assume that we know, from past experience, the performance of six stu- 
dents in written examinations. Table 8—1 exhibits the possibility functions for the 
grades A through E and students 1 through 6. 
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First we observe that the membership function for the grades of student 4 is 
not a possibility function, since g(Q) # 1. 
We can now ask different questions: 


1. How reliable is the statement of student 1 that he will obtain a B in his next 
exam? 
In this case, “A” is {B} and “CA” is {A, C, D, E}. 
Hence, n(A) = 1 


N(A) = min{1i — 7; } 
= min{.2, .3, 1,1} =.2. 
Hence, the possibility of student 1 getting a B is m = 1, the necessity N = .2. 
2. If we want to know the truth of the statement “Either student 1 or 2 will 


achieve an A or a B,” our Q has to be defined differently. It now contains the 
elements of the first two rows. The result would be 


n(A) = n(student 1 A or B or Student 2 A or B) = 1 


N(A) =.3 
3. Letus finally determine the credibility of the statement “student 1 will get a C.” 
In this case 
T(A) =.7 
N(A) =0. 


8.3 Probability of Fuzzy Events 


By now it should have become clear that possibility is not a substitute for prob- 
ability, but rather another kind of uncertainty. 

Let us now assume that an event is not crisply defined except by a possibility 
distribution (a fuzzy set) and that we are in a classical situation of stochastic 
uncertainty, that is, that the happening of this (fuzzily described) event is 
not certain and that we want to express the probability of its occurence. Two 
views on this probability can be adopted: Either this probability should be a scalar 
(measure) or this probability can be considered as a fuzzy set also. We shall 
consider both views briefly. 


8.3.1 Probability of a Fuzzy Event as a Scalar 


In classical probability theory, an event A is a member of an &-field a of subsets 
of a sample space Q. A probability measure P is a normalized measure over a 
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measurable space (Q, a)—that is, P is a real-valued function that assigns to every 
A in a a probability P(A) such that 


1. PA)ZOAEa 
2. P(Q)=1 
3. IfA;€ a,ie ICN, pairwise disjoint, then 


PU A,) = È P(A) 


If Q is, for instance, a Euclidean n-space and a the o-field of Borel sets in R”, 
then the probability of A can be expressed as 


P(A) = I, dP 


If [14(x) denotes the characteristic function of a crisp set of A and E,(\,) the expec- 
tation of u(x), then 


P(A) = ["(x) dP = Ep (ta) 


If u(x) does not denote the characteristic function of a crisp set but rather the 
membership function of a fuzzy set, the basic definition of the probability of A 
should not change. Zadeh [1968] therefore defined the probability of a fuzzy 
event A (i.e., a fuzzy set A with membership function u4(x)) as follows. 


Definition 8-6 


Let (R”, a, P) be a probability space in which a is the o-field of Borel sets in R” 
and P is a probability measure over R”. Then a fuzzy event in R" is a fuzzy set A 
in R” whose membership function g(x) is Borel measurable. 

The probability of a fuzzy event A is then defined by the Lebesque-Stieltjes 
integral 


P(A) = | aP = Elu) 


In Zadeh [1968] the similarity of the probability of fuzzy events and the proba- 
bility of crisp events is illustrated. His suggestions, though very plausible, were 
not yet axiomatically justified in 1968. Smets [1982] showed, however, that an 
axiomatic justification can be given for the case of crisp probabilities of fuzzy 
events within nonfuzzy environments. Other authors consider other cases, such 
as fuzzy probabilities, which we will not investigate in this book. 

We shall rather turn to the definition of the probability of a fuzzy event as a 


UNCERTAINTY MODELING 131 


fuzzy set, which corresponds quite well to some approaches we have discussed, 
for example, for fuzzy integrals. 


8.3.2 Probability of a Fuzzy Event as a Fuzzy Set 


In the following we shall consider sets with a finite number of elements. Let us 
assume that there exists a probability measure P defined on the set of all crisp 
subsets of (the universe) X, the Borel set. P(x;) shall denote the probability of 
element x; € X. 

Let A = {(x, ya(0|x € X} be a fuzzy set representing a fuzzy event. The degree 
of membership of element x; € A is denoted by u4(x). o-level sets or O-cuts as 
already defined in definition 2—3 shall be denoted by Ag. 

Yager [1979, 1984] suggests that it is quite natural to define the probability of 
an Q-level set as P(A) = 2,<4,P(x). On the basis of this, the probability of a fuzzy 
event is defined as follows [Yager 1984]. 


Definition 8—7 


Let A, be the a-level set of a fuzzy set A representing a fuzzy event. Then the 
probability of fuzzy event A can be defined as 


P,(A) = {(P(Aq), œ)la € [0, 1]} 


with the interpretation “the probability of at least an œ degree of satisfaction to 
the condition A.” 

The subscript Y of Py indicates that Py is a definition of probability due to 
Yager that differs from Zadeh’s definition, which is denoted by P. It should be 
very clear that Yager considers a, which is used as the degree of membership of 
the probabilities P(A,) in the fuzzy set P(A), as a kind of significance level for 
the probability of a fuzzy event. 

On the basis of private communication with Klement, Yager also suggests 
another definition for the probability of a fuzzy event, which is derived as follows. 


Definition 8-8 


The truth of the proposition “the probability A is at least w” is defined as the 
fuzzy set P*(A) with the membership function 


P“ (A)(w) = sup{a|P(A,)=w}, we[0,1] 
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The reader should realize that now the “indicator” of significance of the proba- 
bility measure is w and no longer a! The reader should also be aware of the fact 
that we have used Yager’s terminology denoting the values of the membership 
function by P*(A)(w). This will facilitate reading Yager’s work [1984]. 

If we denote the complement of A by CA = {(x, 1 — ug(x))|x € X} and 
the a-level sets of CA by (CA ),, then P*(CA)(w) = sup, {a|P(CA ),, = w}, and 
w e [0, 1] can be interpreted as the truth of the proposition “the probability of 
not A is at least w.” 

Let us define P*(A) = 1 — P*(CA ). If P*(A)(w) is interpreted as the truth of 
the proposition “probability of A is at most w,” then we can argue as follows: 
The “and” combination of “the probability of A is at least w” and “the probabi- 
lity of A is at most w” might be considered as “the probability of A is exactly w.” 
If P*(A) and P*(A) are considered as possibility distributions, then their con- 
junction is their intersection (modeled by applying the min-operator to the res- 
pective membership functions). Hence the following definition [Yager 1984]: 


Definition 8-9 [Yager 1984] 


Let P*(A) and P*(A) be defined as above. The possibility distribution associated 
with the proposition “the probability of A is exactly w” can be defined as 


P,(A)(w) = min{ Py (A)(w), Py (A)(w)} 


Example 8-6 


Let A = {(x;, 1), (x2, .7), (%3, .6), (x4, .2)} be a fuzzy event with the probability 
defined for the generic elements: P, = .1, P} = .4, P; = .3, and P, = .2; p{x2} is 
4, where the element x, belongs to the fuzzy event A with a degree of .7. 

First we compute P*(A). We start by determining the o-level sets A, for all o 
e [0, 1]. Then we compute the probability of the crisp events A, and give the 
intervals of w for which P(A,) 2 w. We finally obtain P*(A) as the respective 
supremum of Q. 

The computing is summarized in the following table: 


a Aw P(A) w P*(A) = sup a 
[0, .2] {x}, X2, X35 X4} 1 [.8, 1] 2 
[.2, .6] {x}, X2, X3} 8 [.5, .8] 6 
[.6, .7] {x1, X2} 5 [.1, .5] 7 


[.7, 1] {xı} 1 [O, .1] 1 
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Analogously, we obtain for P#(A) = 1 — PX(CA ), 


(C) P(C) w P*(¢Ã)  P,(ã) = 1 — PCA) 
0 {x1, Xo, X3, X4} l [.9, 1] 0 1 
[0,.3] {x2 x3, x4} 9  [.5, 9] 3 7 
[.3,.4] (3, x4} 5 [2,.5] 4 6 
[.4,.8] {x4} 2 0.23 8 2 
[8,1] 0 0 0 l 0 


The probability P,(A) of the fuzzy even A is now determined by the intersec- 
tion of the fuzzy sets P*(A) and P*(A) modeled by the min-operator as in 
definition 8-9: 


0, w=0 
— ~ .2, wel0,.2] 
P,(A)(w) = 
6, wel.2,.8] 
2, wel[.8, 1] 


Figure 8-2 illustrates the fuzzy sets P*(A)(w), P*(A) and P,(A)(w). 


8.4 Possibility vs. Probability 


Questions concerning the relationship between fuzzy set theory and probability 
theory are very frequently raised, particularly by “newcomers” to the area of 
fuzzy sets. There are probably two major reasons for this. On the one hand, there 
are certain formal similarities between fuzzy set theory (in particular when using 
normalized fuzzy sets) and probability theory; on the other hand, in the past 
probabilities have been the only means for expressing “uncertainty.” It seems 
appropriate and helpful, therefore, to shed some more light on this question. 

In the introduction to this chapter, it was already mentioned that such a com- 
parison is difficult because of the lack of unique definitions of fuzzy sets. This 
lack of a unique definition is due in part to the variety of suggested possibilities 
for mathematically defining fuzzy sets as well as operations on them, as indicated 
in chapters 2 and 3. It is also due to the many different kinds of fuzziness that 
can be modeled with fuzzy sets, as described in chapter 1. 

Another problem is the selection of the aspects with respect to which these 
theories shall be compared (see the introduction to this chapter!). 
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Figure 8-2. Probability of a fuzzy event. 


In section 8.2, possibility theory was briefly explained. There it was mentioned 
that possibility theory is more than the min-max version of fuzzy set theory. It 
was also shown that the “uncertainty measures” used in possibility theory are the 
possibility measure and the necessity measure, two measures that in a certain 
sense are dual to each other. In comparing possibility theory with probability 
theory, we shall first consider only possibility functions—and measures (neglect- 
ing the existence of dual measures)—of possibility theory. At the end of the 
chapter, we shall investigate the relationship between possibility theory and prob- 
ability theory. 
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Let us now turn to probabilities and try to characterize and classify available 
notions of probabilities. Three aspects shall be of main concern: 


1. The linguistic expression of probability. 
The different information context of different types of probabilities. 

3. The semantic interpretation of probabilities and its axiomatic and mathe- 
matical consequences. 


Linguistically, we can distinguish explicit from implicit formulations of proba- 
bility. With respect to the information content, we can distinguish between prob- 
abilities that are classificatory (given E, H is probable), comparative (given EF, H 
is more probable than K), partial (given E, the probability of K is in the interval 
[a, b]), and quantitative (given E, the probability of H is b). 

Finally, the interpretation of a probability can vary considerably. Let us con- 
sider two very important and common interpretations of quantitative probabili- 
ties. Koopman [1940, pp. 269-292] and Carnap and Stegmiiller [1959] interpret 
(subjective) probabilities essentially as degrees of truth of statements in dual 
logic. Axiomatically, Koopman derives a concept of probability, g, which math- 
ematically is a Boolean ring. 

Kolmogoroff [1950] interprets probabilities “statistically.” He considers a 
set Q and an associated o-algebra F, the elements of which are interpreted as 
events. On the basis of measurement theory, he defines a (probability) function 
P: F — [0, 1] with the following properties: 


P:1 [0,1] (8.11) 
P(Q)=1 (8.12) 
WX )EF(Vi, jeNit j2X,NX, =0) P(U X;) => P(X;) (8.13) 


ieN 
From these properties, the following relationships can easily be derived: 
X, CX e F > P(X) = 1 - P(X) (8.14) 
X,Y eF —P(XUY)=P(X)+P(Y)-P(XAY) (8.15) 


where €X denotes the complement of X. 

Table 8-3 illustrates the difference between Koopman’s and Kolmogoroff’s 
concept of probability, taking into account the different linguistic and informa- 
tional possibilities mentioned above. 

Now we are ready to compare “fuzzy sets” with “probabilities,” or at least one 
certain version of fuzzy set theory with one of probability theory. Implicit prob- 
abilities re not comparable to fuzzy sets, since fuzzy set models try particularly 
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Table 8-3. Koopman’s vs. Kolmogoroff’s probabilities. 


Koopman 


D, D’, H, H’ are statements of dual 
logic, Q is a nonnegative real number 
(generally Q e [0, 1]) 


Classificatory: 

1. Implicit: D supports H 

2. Explicit: H is probable on the basis 
of D 


Comparative: 

1. Implicit: D supports H more than D’ 
supports H’ 

2. His more probable given D than H’ 
is, given D’. 

Quantitative: 


1. The degree of support for H on the 
basis of D is G. 
2. The probability for H given D is Q. 


Kolmogoroff 


W is a set of events, W, are subsets of W. 


1. W, is a nonempty subset of W 
If one throws the dice W times, 
probably no W, is empty. 


1. For W times one throws the dice, 
W, is of equal size as Wj. 

2. If one throws a coin W times, W, is 
as probable as W;. 


1. The ratio of the number of events in 
W, and W is Q. 

2. The probability that the result of 
throwing a dice is Z when throwing 
the dice M times is Q}. 


to model uncertainty explicitly. Comparative and partial probabilities are more 
comparable to probabilistic statements using “linguistic variables,” which we will 


cover in chapter 9. 


Hence, the most frequently used versions we shall compare now are quanti- 
tative, explicit Kolmogoroff probabilities with possibilities. 
Table 8—4 depicts some of the main mathematical differences between three 


areas that are similar in many respects. 


Let us now return to the “duality” aspect of possibility measures and neces- 


sity measures. 


A probability measure, P(A), satisfies the additivity axiom, that is, VA, BGC Q 


for which A A B = Q: 


P(A U B) = P(A) + P(B) (8.16) 


This measure is monotonic in the sense of condition 2 of definition 4-2. Equa- 
tion (8.12) is the probabilistic equivalent to (8.1) and (8.2). 
The possibility theory conditions (8.5) and (8.8) imply 


N(A) + N(@A) < 1 (8.17) 
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Table 8—4. Relationship between Boolean algebra, probabilities, and possibilities. 


Probabilities 
Boolean (quantitative 
algebra explicit) Possibilities 
Domain Set of (logic) O-algebra Any universe X 
statements 
Range of values {0, 1} [0, 1] [0, 1] 
membership fuzzy: 0 < u < œ real 
Special constraints Zp) = | 
Union (independent, max » max 
noninteractive) 
Intersection min IT min 
Conditional yes no often 
equal to joint? 
What can be used conditional conditional conditional, 
for inference? or joint often joint 
n(A) + n(A) = 1 (8.18) 


which is less stringent than the equivalent relation 
P(A) + P(CA) = 1 (8.19) 


of probability theory. 

In this sense, possibility corresponds more to evidence theory [Shafer 1976] 
than to classical probability theory, in which the probabilities of an element (a 
subset) are uniquely related to the probability of the contrary element (comple- 
ment). In Shafer’s theory, which is probabilistic in nature, this relationship is 
also relaxed by introducing an “upper probability” and a “lower probability,” 
which are as “dual” to each other as are possibility and necessity. 

In fact, possibility and necessity measures can be considered as limiting cases 
of probability measures in the sense of Shafer, that is, 


N(A)S P(A) S mA) VAGQ (8.20) 


This in turn links intuitively again with Zadeh’s “possibility/probability consis- 
tency principle” mentioned in section 8.2.1. 
Concerning the theories considered in this chapter, we can conclude the fol- 


138 FUZZY SET THEORY—AND ITS APPLICATIONS 


lowing. Fuzzy set theory, possibility theory, and probability theory are no sub- 
stitutes, but they complement each other. While fuzzy set theory has quite a 
number of “degrees of freedom” with respect to intersection and union operators, 
kinds of fuzzy sets (membership functions), etc., the latter two theories are well 
developed and uniquely defined with respect to operation and structure. Fuzzy 
set theory seems to be more adaptable to different contexts. This, of course, 
also implies the need to adapt the theory to a context if one wants it to be an 
appropriate modeling tool. 


Exercises 


1. Let U and F be defined as in example 8-2. Determine the possibility distri- 
bution associated with the statement “X is not a small integer.” 

2. Define a probability distribution and a possibility distribution that could be 
associated with the proposition “cars drive X mph on American freeways.” 

3. Computer the possibility measures (definition 8—4) for the following possi- 
bility distributions: 


A={6,7,..., 13,14} 
“X is an integer close to 10” 
Ta = 1(8, .6), (9, .8), (10, 1), (11, .8), (12, .6)} 
or alternatively, 
Ta = 1(6, .4), (7, .5), (8, .6), (9, .8), (10, 1), (1 1, .8), (12, .6), (13, .5), (14, .4)} 


Discuss the results. 

4. Discuss the relationships between general measures, fuzzy measures, prob- 
ability measures, and possibility measures. 

5. Determine Yager’s probability of a fuzzy event for the event “X is an integer 
close to 10” as defined in exercise 3 above. 

6. List examples for each of the kinds of probabilistic statements given in table 
8-3. 

7. Analyze and discuss the assertion that P*(A)(w) can be interpreted as the 
truth of the proposition “the probability of A is at most w.” 


[| APPLICATIONS OF 
FUZZY SET THEORY 


Applications of fuzzy set theory can already be found in many different areas. 
One could probably classify those applications as follows: 


1. Applications to mathematics, that is, generalizations of traditional mathe- 
matics such as topology, graph theory, algebra, logic, and so on. 

2. Applications to algorithms such as clustering methods, control algorithms, 
mathematical programming, and so on. 

3. Applications to standard models such as “the transportation model,” “inven- 
tory control models,” “maintenance models,” and so on. 

4. Finally, applications to real-world problems of different kinds. 


In this book, the first type of “applications” will be covered by looking at fuzzy 
logic and approximate reasoning. The second type of applications will be illus- 
trated by considering fuzzy clustering, fuzzy linear programming, and fuzzy 
dynamic programming. The third type will be covered by looking at fuzzy ver- 
sions of standard operations research models and at multicriteria approaches. The 
fourth type, eventually, will be illustrated on the one hand by describing opera- 
tions research (OR) models as well as empirical research in chapter 15. On the 
other hand, chapter 10 has entirely been devoted to fuzzy control and expert 
systems, the area in which fuzzy set theory has probably been applied to the 
largest extent and also which is closest to real applications. 


9 FUZZY LOGIC 
AND APPROXIMATE 
REASONING 


9.1 Linguistic Variables 


In retreating from precision in the face of overpowering complexity, it is natural 
to explore the use of what might be called linguistic variables, that is, variables whose 
values are not numbers but words or sentences in a natural or artificial language. 

The motivation for the use of words or sentences rather than numbers is that lin- 
guistic characterizations are, in general, less specific than numerical ones [Zadeh 
1973a, p. 3]. 


This quotation presents in a nutshell the motivation and justification for fuzzy 
logic and approximate reasoning. Another quotation might be added, which is 
much older. The philosopher B. Russell noted: 


All traditional logic habitually assumes that precise symbols are being employed. It is 
therefore not applicable to this terrestrial life but only to an imagined celestial exis- 
tence [Russell 1923]. 


One of the basic tools for fuzzy logic and approximate reasoning is the notion of 
a linguistic variable that in 1973 was called a variable of higher order rather than 
a fuzzy variable and defined as follows [Zadeh 1973a, p. 75]. 
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Definition 9-1 


A linguistic variable is characterized by a quintuple (x, T(x), U, G, M) in which 
x is the name of the variable; T(x) (or simply T) denotes the term set of x, that 
is, the set of names of linguistic values of x, with each value being a fuzzy vari- 
able denoted generically by X and ranging over a universe of discourse U that is 
associated with the base variable u; G is a syntactic rule (which usually has the 
form of a grammar) for generating the name, X, of values of x; and M is a seman- 
tic rule for associating with each X its meaning, M(X), which is a fuzzy subset 
of U. A particular X—that is, a name generated by G—is called a term. It should 
be noted that the base variable u can also be vector valued. 

In order to facilitate the symbolism in what follows, some symbols will have 
two meanings wherever clarity allows this: x will denote the name of the vari- 
able (“the label”) and the generic name of its values. The same will be true for 
X and M(X). 


Example 9-1 [Zadeh 1973a, p. 77] 


Let X be a linguistic variable with the label “Age” (1.e., the label of this variable 
is “Age,” and the values of it will also be called “Age’) with U = [0, 100]. 
Terms of this linguistic variable, which are again fuzzy sets, could be called 
“old,” “young,” “very old,” and so on. The base-variable u is the age in years 
of life. M(X) is the rule that assigns a meaning, that is, a fuzzy set, to the 
terms: 


Mold) = {(u, Hoa (U))lu € [0, 100]} 
where 
0 u € [0, 50] 
Uon (u) = (“ — 50 
1+ 
5 





) u € (50, 100] 


T(x) will define the term set of the variable x, for instance, in the case 


T(Age) = {old, very old, not so old, more or less young, 
quite young, very young} 


where G(x) is a rule that generates the (labels of) terms in the term set. 
Figure 9-1 sketches another way to represent the linguistic variable “age”. 
Two linguistic variables of particular interest in fuzzy logic and in (fuzzy) 
probability theory are the two linguistic variables “Truth” and “Probability.” The 
linguistic variable “Probability” is depicted exemplarily in figure 9-2. 
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Figure 9-3. Linguistic variable “Truth.” 


The term set of the linguistic variable “Truth” has been defined differently by 
different authors. Baldwin [1979, p. 316] defines some of the terms as shown in 


figure 9-3. Here, 
[very mue (V) =(Uine(v)) ve [0, 1] 
Lisi me (V) = Gine(W)) ve L041 
and so on. Zadeh [1973a, p. 99] suggests for the term true the membership function 
0 for O<v<a 


at+l 
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FALSE 
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Figure 9-4. Terms “True” and “False.” 


where v = (1 + a)/2 is called the crossover point, and a € [0, 1] is a parameter 
that indicates the subjective judgment about the minimum value of v in order to 
consider a statement as “true” at all. 

The membership function of “false” is considered as the mirror image of 
“true,” that is, 


LU false (v) = U rue (1 _ v) 0 < v < 1 


Figure 9—4 [Zadeh 1973a, p. 99] shows the terms true and false. 

Of course, the membership functions of true and false, respectively, can also 
be chosen from the finite universe of truth values. The term set of the linguistic 
variable “Truth” is then defined as [Zadeh 1973a, p. 99] 


T(Truth) = {true, not true, very true, not very true, ..., false, not false, 
very false, . . . , not very true and not very false, ... } 


The fuzzy sets (possibility distribution) of those terms can essentially be deter- 
mined from the term true or the term false by applying appropriately the below- 
mentioned modifiers (hedges). 
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Definition 9-2 


A linguistic variable x is called structured if the term set T(x) and the meaning 
M(x) can be characterized algorithmically. For a structured linguistic variable, 
M(x) and T(x) can be regarded as algorithms that generate the terms of the term 
set and associate meanings with them. 

Before we illustrate this by an example, we need to define what we mean by 
a “hedge” or a “modifier.” 


Definition 9-3 


A linguistic hedge or a modifier is an operation that modifies the meaning of a 
term or, more generally, of a fuzzy set. If A is a fuzzy set, then the modifier m 
generates the (composite) term B = m(A). 

Mathematical models frequently used for modifiers are as follows: 


. 2 
concentration: eon 4) (u) = (uz (u)) 
ye 1/2 
dilation: pana (u) = (ua) 
contrast intensification: 


_ f2u,@y for u;(u)e[0,.5] 
Wint( a) (U) — 2 , 
1—-2(1-u;(u) otherwise 


Generally the following linguistic hedges (modifiers) are associated with above- 
mentioned mathematical operators: 
If A is a term (a fuzzy set), then 


very A= con(A) 
more or less A = dil(A) 
plus A = A!” 
slightly A = int [plus A and not (very A)] 


where “and” is interpreted possibilistically. 


Example 9-2 [Zadeh 1973a, p. 83] 


Let us reconsider from example 9-1 the linguistic variable “Age.” The term set 
shall be assumed to be 


T(Age) = {old, very old, very very old, ... } 


The term set can now be generated recursively by using the following rule 
(algorithm): 


148 FUZZY SET THEORY—AND ITS APPLICATIONS 


T™ = {old} U {very T’} 
that is, 
T° =0 
T' = {old} 
T? ={old, very old} 
T? ={old, very old, very very old} 


For the semantic rule, we only need to know the meaning of “old” and the 
meaning of the modifier “very” in order to determine the meaning of an arbitrary 
term of the term set. If one defines “very” as the concentration, then the terms of 
the term set of the structured linguistic variable “Age” can be determined, given 
that the membership function of the term “old” is known. 


Definition 9-4 [Zadeh 1973a, p. 87] 


A Boolean linguistic variable is a linguistic variable whose terms, X, are Boolean 
expressions in variables of the form X,, m(X,) where X, is a primary term and m 
is a modifier. m(X,) is a fuzzy set resulting from acting with m on X,. 


Example 9-3 


Let “Age” be a Boolean linguistic variable with the term set 


T(Age) = {young, not young, old, not old, very young, 
not young, and not old, young or old, ... } 


Identifying “‘and” with the intersection, “or” with the union, “not” with the com- 
plementation, and “very” with the concentration, we can derive the meaning of 
different terms of the term set as follows: 


M(not young) = — young 


M(not very young) = — (young) 
M(young or old) = young U old etc. 


Given the two fuzzy sets (primary terms) 
M(young) = {(u, Kyoung (Uu € [0, 100]} 


where 
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u e [0,25] 
[young (4) = (1 + u - sy) u € (25, 100] 
and 
M(old) = {(u, Hoia (ulu € [0, 100]} 
where 
| we [0, 50) 
Hoa (u) = ( fi (“ =") u € (50, 100] 


then the membership function of the term “young or old” would, for instance, be 


l if ue[0, 25] 


f1+(4<%) | if ue (25, 50] 


2 
Le young or old (u) = oon: +(* =) 











9.2 Fuzzy Logic 
9.2.1 Classical Logics Revisited 


Logics as bases for reasoning can be distinguished essentially by their three topic- 
neutral (context-independent) items: truth values, vocabulary (operators), and 
reasoning procedure (tautologies, syllogisms). 

In Boolean logic, truth values can be 0 (false) or 1 (true), and by means of 
these truth values, the vocabulary (operators) is defined via truth tables. 

Let us consider two statements, A and B, either of which can be true or 
false, that is, have the truth value 1 or 0. We can construct the following truth 
tables: 
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A B A V XV => -= ? 
1 1 1 1 0 1 1 1 
1 0 0 1 1 0 0 1 
0 1 0 1 1 1 0 0 
0 0 0 0 0 1 1 0 


There are 2” = 16 truth tables, each defining an operator. Assigning meanings 
(words) to these operators is not difficult for the first 4 or 5 columns: the first 
obviously characterizes the “and,” the second the “inclusive or,” the third the 
“exclusive or,” and the fourth and fifth the implication and the equivalence. We 
will have difficulties, however, interpreting the remaining nine columns in terms 
of our language. If we have three statements rather than two, this task of assign- 
ing meanings to truth tables becomes even more difficult. 

So far it has been assumed that each statement, A and B, could clearly be clas- 
sified as true or false. If this is no longer true, then additional truth values, such 
as “undecided” or a similar description, can and have to be introduced, which 
leads to the many existing systems of multivalued logic. It is not difficult to see 
how the above-mentioned problems of two-valued logic in “calling” truth tables 
or operators increase as we move to multivalued logic. For only two statements 
and three possible truth values, there are already 3% = 729 truth tables! The 
uniqueness of interpretation of truth tables, which is so convenient in Boolean 
logic, disappears immediately because many truth tables in three-valued logic 
look very much alike. 

The third topic-neutral item of logical systems is the reasoning procedure 
itself, which is generally based on tautologies such as 


modus ponens: (A A (A > B) >B 

modus tollens: (A = B) ^ «a B) => ~A 
syllogism: (ASBA(B>O))> ADC) 
contraposition: (A = B) > (~B = —A) 


Let us consider the modus ponens, which could be interpreted as: “If A is true 
and if the statement ‘If A is true then B is true’ is also true, then B is true.” 

The term true is used at different places and in two different senses: All but 
the last “trues” are material trues, that is, they are taken as a matter of fact, while 
the last ‘true’ is a topic-neutral logical “true.” In Boolean logic, however, these 
“trues” are all treated the same way [see Mamdani and Gaines 1981, p. xv]. A 
distinction between material and logical (necessary) truth is made in so-called 
extended logics: Modal logic [Hughes and Cresswell 1968] distinguishes between 
necessary and possible truth, and tense logic between statements that were true 
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in the past and those that will be true in the future. Epistemic logic deals with 
knowledge and belief and deontic logic with what ought to be done and what 
is permitted to be true. Modal logic, in particular, might be a very good basis 
for applying different measures and theories of uncertainty, as indicated in 
chapter 4. 

Another extension of Boolean logic is predicate calculus, which is a set theo- 
retic logic using quantifiers (all, etc.) and predicates in addition to the operators 
of Boolean logic. 

Fuzzy logic [Zadeh 1973a, p. 101] is an extension of set-theoretic multival- 
ued logic in which the truth values are linguistic variables (or terms of the lin- 
guistic variable truth). 

Since operators, like v, A, ~, = in fuzzy logic are also defined by using truth 
tables, the extension principle can be applied to derive definitions of the opera- 
tors. So far, possibility theory (see section 8.1) has primarily been used in order 
to define operators in fuzzy logic, even though other operators have also been 
investigated (see, for instance, Mizumoto and Zimmermann [1982]), and could 
also be used. In this book, we will limit considerations to possibilistic interpre- 
tations of linguistic variables, and we will also stick to the original proposals of 
Zadeh [1973a]. To the interested reader, however, we suggest supplemental study 
of alternative approaches such as those by Baldwin [1979], Baldwin and 
Pilsworth [1980], Giles [1979, 1980], and others. 

If v(A) is a point in V = [0, 1], representing the truth value of the proposition 
“u is A” or simply A, then the truth value of not A is given by 


v(not A) = 1—v(A) 


Definition 9-5 


If v(A) is a normalized fuzzy set, v(A) = {(v; )li = 1,..., n, v; € [0, 1]}, 
then by applying the extension principle, the truth value of v(not A) is defined 
as 


v(not A) = {01 — Vi, UL; li = l, --+5N, Vi E [0, 1] 
In particular, “false” is interpreted as “not true,” that is, 


v(false) = {0 — Vi, Mi i = l, .2-,N, V; E [0, 1]} 


Example 9-4 


Let us consider the terms true and false, respectively, defined as the following 
possibility distributions: 
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v(true) = {(.5, .6), (.6, .7), C7, .8), (.8, .9), 6.9, 1), (1, 1)} 
v(false) = H(not true) = {(.5, .6), (.4, .7), (3, .8), C2, .9), (.1, 1), (0, 1)} 


Then 


Wvery true) = {(.5, .36), (.6, .49), (.7, .64), (.8, .81), (.9, 1), (1, 1} 
v(very false) = {(.5, .36), (4, .49), (.3, .64), (.2, .81), (.1, 1), (O, 1)} 


It has already been mentioned that fuzzy logic is essentially considered as an 
application of possibility theory to logic. Hence the logical operators “‘and,” “or,” 
and “not” are defined accordingly. 


Definition 9-6 
For numerical truth values v(A) and v(B), the logical operations and, or, not, and 
implied are defined as 

V(A) A V(B) = W(A ^ B) = min{v(A), v(B)} 

v(A) v V(B) = V(A v B) = max{v(A), v(B)} 

=V(A) = 1- v(A)} 
v(A)= V(B) = VA => B)=-V(A) v V(B) 
= max{1-— v(A), v(B)} 


If 
V(A) = {(v;,a;)}, œ; €[0, 1], v; €[0, 1] 
v(B) = {(w,,B,)t, Bi €[0, 1], œ; €[0, 1] 
i=l,....m;j=1,...,m 
then 


(A and B)= (A) a P(B) = f(u = min{v,,w;}, max minfa, B;}) 


u=min { V; 0; 





i=l, m j=l,- m] 


(This is equivalent to the intersection of two type 2 fuzzy sets.) The other oper- 
ators are defined accordingly. 


Example 9-5 


Let v(A) = true = {(.5, .6), (.6, .7), (.7, .8), (.8, .9), (.9, 1), (1, 1)}. 
Then 
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AWA) = {(0, 1), C1, 1), (.2, 1), (.3, 1), C4, 1), C5, .4), (6, .3), C7, .2), 
(.8, .1)} 


9.2.2 Linguistic Truth Tables 


As mentioned at the beginning of this section, binary connectives (operators) in 
classical two- and many-valued logics are normally defined by the tabulation of 
truth values in truth tables. In fuzzy logic, the number of truth values is, in 
general, infinite. Hence tabulation of the truth values for operators is not possi- 
ble. We can, however, tabulate truth values, that is, terms of the linguistic vari- 
able “Truth,” for a finite number of terms, such as true, not true, very true, false, 
more or less true, and so on. 

Zadeh [1973a, p. 109] suggests truth tables for the determination of truth 
values for operators using a four-valued logic including the truth values true, 
false, undecided, and unknown. “Unknown” is then interpreted as “true or false” 
(7 + F), and “undecided” is denoted by ©. 

Extending the normal Boolean logic with truth values true (1) and false (0) to 
a (fuzzy) three-valued logic (true = 7, false = F, unknown = T + F), with a uni- 
verse of truth values being two-valued (true and false), we obtain the following 
truth tables, in which the first column contains the truth values for a statement A 
and the first row those for a statement B [Zadeh 1973a, p. 116]: 





T T F T+F 
F F F F 
T+FIT+F F T+F 


Truth table for “and” 





T T F T 
F T F T+F 
T+FiT T+F T+F 


Truth table for “or” 


T F 
F T 
T+FIiT+F 


Truth table for “not” 
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If the number of truth values (terms of the linguistic variable truth) increases, 
one can still “tabulate” the truth table for operators by using definition 9-6 as 
follows: Let us assume that the i" row of the table represents “not true” and the 
j” column “more or less true.” The (i, j)* entry in the truth table for “and” would 
then contain the entry for “not true ^ more or less true.” The resulting fuzzy set 
would, however, most likely not correspond to any fuzzy set assigned to the terms 
of the term set of “truth.” In this case, one could try to find the fuzzy set of the 
term that is most similar to the fuzzy set resulting from the computations. Such 
a term would then be called linguistic approximation. This is an analogy to sta- 
tistics, where empirical distribution functions are often approximated by well- 
known standard distribution functions. 


Example 9-6 


Let V = {0, .1, .2,..., 1} be the universe, 

true = {(.8, .9), (.9, 1), (d, 1}, 

more or less true = {(.6, .2), (.7, .4), (.8, .7), (.9, 1), (1, 1}, and 
almost true = {(.8, .9), 0.9, 1), (1, .8)}. 


Let “more or less true” be the i" row and “almost true” the j" column of the 
truth table for “or.” 
Then “more or less true v almost true” is the (i, j)" entry in the table: 


more or less true v almost true 
= {(.6, .2), (.7, .4), (8, .7), 69, 1), 1, 1)} v (C8, .9), C9, 1), (1, .8)} 
= {(.6, .2), (.7, .4), (.8, .9), 0.9, 1), (d, D} 


Now we can approximate the right-hand side of this equation by 
true = {(.8, .9), (.9, 1), (1, 1)} 
This yields 
“more or less true v almost true” = “true.” 


Baldwin [1979] suggests another version of fuzzy logic—fuzzy truth tables, and 
their determination: The truth values on which he bases his suggestions were 
shown graphically in figure 9-3. They were defined as 


true = {(V, Uime (V) = vilv e [0, 1} 
false = {(V, Urase(V) = 1 — Hine (VIV € [0, 1]} 
very true = f(v, (Hine (V) JIV e [0, 1]} 
fairly true = f(v, (He (V) }v €[0, 1} 
undecided = {(v, Div €[0, 1]} 
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Very false and fairly false were defined correspondingly, and 


absolutely true = {(v, a (VD|v € [0, 1]} 
r forv=1 


with u„ (vV) = 
H 0 otherwise 


absolutely false = {(v, ua (vV)|v € [0, 1]} 
1 forv=0 


with u, (Vv) =| 
Har Q otherwise 


Hence 


(very) true — absolutely true as k > © 
(very) false — absolutely false as k > © 
(fairly) true —> undecided as k — © 
(fairly) false — undecided as k — œ% 


Using figure 9-3 and the interpretations of “and” and “or” as minimum and 
maximum, respectively, the following truth table results [Baldwin 1979, p. 318]: 


v(P) v(Q) v(P and Q) v(P or Q) 
false false false false 

true false false true 

true true true true 
undecided false false undecided 
undecided true undecided true 
undecided undecided undecided undecided 
true very true true very true 
true fairly true fairly true true 


Some more considerations and assumptions are needed to derive the truth table 
for the implication. Baldwin considers his fuzzy logic to rest on two pillars: the 
denumberably infinite multivalued logic system of Lukasiewicz logic and fuzzy 
set theory: 


Implication statements are treated by a composition of fuzzy truth value restrictions 
with a Lukasiewicz logic implication relation on a fuzzy truth space. Set theoretic con- 
siderations are used to obtain fuzzy truth value restrictions from conditional fuzzy lin- 
guistic statements using an inverse truth functional modification procedure. Finally true 
functions modification is used to obtain the final conclusion [Baldwin 1979, p. 309]. 
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9.3 Approximate and Plausible Reasoning 
We already mentioned that in traditional logic the main tools of reasoning are 


tautologies, such as, for instance, the modus ponens—that is (A ^ (A => B)) > 
B or 


Premise A is true 
Implication | If A then B 





Conclusion | B is true 


Here A and B are crisply defined statements or propositions; the A’s in the premise 
and the implication are identical, and so are the B’s in the implication and con- 
clusion. The “implication” is defined via truth tables, as shown in section 9.2.1. 
Approximate and plausible reasoning are ways of drawing conclusions from 
hypotheses. They relax even more stringent assumptions of dual logic than fuzzy 
logic does and try to approach human reasoning even more closely. 
Three natural generalizations of the classical modus ponens are 


To modify the definition of the “implication,” 

To allow statements that are no longer crisp but contain a fuzzy set, such as 
linguistic variables, and 

3. To relax the identity of the A’s and B’s in the premise rule and conclusion 
by substituting for “identical” the term “similar.” 


N = 


Relaxations of point 2 lead to “approximate reasoning,” and relaxations of points 
2 and 3 lead to “plausible reasoning.” 

We shall first briefly consider point 1 and then turn to points 2 and 3. 

The rule “if A then B” is often written as A — B. The symbol “—” is then often 
interpreted as implication, whose meaning is formally defined in logic. Obviously, 
there are two “translations” between the three different levels involved: the lin- 
guistic level (rule), the symbolic level (—), and the formal logical level. 

The relationship between the linguistic expression “if A then B” and the 
respective mathematical description cannot be derived formally, but only em- 
pirically. This problem belongs in the area of psycholinguistics, and empirical 
research in this direction is still very rare [Spiess 1989]. 

If “A — B” is interpreted as material implication, in which A is called the 
premise and B the consequence, then the truth values v(A), v(B), and v(A —> B) 
can in dual logic be either 0 or 1. As shown in the truth table in section 9.2.1, the 
truth value of v(A — B) is 0 if A is true and B is false; otherwise, its truth value 
is 1. This corresponds to the view that the implication is true whenever the con- 
sequence is at least as true as the premise. In Boolean logic, A — B is equiva- 
lent to ~A v (A A B) (not A or (A and B)). 
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On the bases of these basic relationships, various implication operators have 
been defined. Ruan [1991] has investigated 18 of these definitions, which are all 
restricted to the min-max theory. We only show a selection of them in the next 
table. x denotes the degree of truth (or degree of membership) of the premise, y 
the respective values for the consequence, and / the resulting degree of truth for 
the implication. 


Name Definition of Implication Operator 
Early Zadeh [ n(x, y) = max(1 — x, min(x, y)) 
Lukasiewicz I(x, y) = min(1, 1 — x + y) 
Minimum (Mamdani) I(x, y) = min(x, y) 
. _ flxsy 

Standard Star (Gödel) (x, y) = | y elsewhere 
Kleene-Dienes I(x, y) = max(1 — x, y) 

, _ flxsy 
Gaines I(x, y) = | y/z elsewhere 


Yager I(x, y) = y 


The “quality” of these implication operators could again be evaluated either 
empirically or axiomatically. For the latter, a well-accepted axiomatic system 
such as that of Smets and Magrez [1987] can be used. The authors assume that 
the implication operator is truth functional, i.e., that the truth of “A — B” only 
depends on the truth of A and B. They have formulated the following axioms: 


1. wA —> B)=v—-B - A) 
(contrapositive symmetry) 

2. WA (BO OC) = {B> (A *O)) 
(exchange principle) 

3. WA B) 2 xC —> D) if 
v(A) < v(C) and/or v(B) 2 v(D) 
(monotonicity) 

4. (A — B) = 1 if v(A) < WB) 
(boundary condition) 

5. v(T — A) = v(A), where T stands for tautology 
(neutrality principle) 

6. v(A — B) is continuous in its arguments 
(continuity) 


Table 9-1 shows which of the implication operators satisfy (Y) or violate (N) the 
above axioms. 
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Table 9-1. Formal quality of implication operators. 


Tn I, I, I, I, I Ig 
Al N Y N N Y N N 
Contraposition 
A2 N Y Y Y Y N Y 
Exchange 
Principle 
A3 N Y N Y Y Y Y 
Monotonicity 
A4 N Y N Y N Y N 
Boundary 
Condition 
AS Y Y Y Y Y Y Y 
Neutrality 
Principle 
A6 Y Y Y N Y N N 
Continuity 


If one uses the fraction of the axioms that are satisfied by the various impli- 
cations as their degree of membership in the fuzzy set “good implication opera- 
tors,” then one would obtain the following fuzzy set: 


Good Implication Operators 


f(m =} (Io , 1), a >} (1 =} (n. >) (n. >} (1 x) 
3 2 3 6 2 2 


For approximate and plausible reasoning as defined above, the modus ponens is 
extended to the “generalized modus ponens” [Zadeh 1973a, p. 56; Mizumoto et 
al. 1979; Mamdani 1977a]. 


Example 9-7 


Let A, A’, B, B be fuzzy statements; then the generalized modus ponens reads 


Premise: x is A’ . . 
Implication: If x is A, then y is B 


Conclusion: y is B’ 
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Premise: This tomato is very red. 
Implication: If a tomato is red then the tomato is ripe. 


Conclusion: This tomato is very ripe. 


It should be mentioned, however, that the generalized modus ponens alone does 
not allow us to obtain conclusions from unequal premises. Such an inference pre- 
supposes or necessitates knowledge about modifications of the premises and their 
consequences (for example, knowledge that an increase in “redness” indicates an 
increase in “ripeness” [Dubois and Prade 1984b, p. 325]. 

In 1973, Zadeh suggested the compositional rule of inference for the above- 
mentioned type of fuzzy conditional inference. In the meantime, other authors 
(for instance, Baldwin [1979]; Baldwin and Pilsworth [1980]; Baldwin and Guild 
[1980]; Mizumoto et al. [1979]; Mizumoto and Zimmermann [1982]; Tsukamoto 
[1979]), have suggested different methods and have also investigated the modus 
tollens, syllogism, and contraposition. In this book, however, we shall restrict 
considerations to Zadeh’s compositional rule of inference. 


Definition 9-7 [Zadeh 1973a, p. 148] 


Let R(x), R(x, y), and RO), xe X, ye Y, be fuzzy relations in X, X x Y, and Y, 
respectively, that act as fuzzy restrictions on x, (x, y), and y, respectively. Let A 
and B denote particular fuzzy sets in X and X x Y. Then the compositional rule 
of inference asserts that the solution of the relational assignment equations (see 
definition 8-1) R(x) = A and R(x, y)= Bis given by RV) = o B, where A o B is 
the composition of A and B. 


Example 9-8 


Let the universe be X = {1, 2, 3, 4}. 
A = little = {(1, 1), (2, .6), (3, .2), (4, 0)}. 
R = “approximately equal” be a fuzzy relation defined by 


1 2 3 4 
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For the formal inference, denote 
R(x)=A, R(x,y)=B, and R(y)=AcB 
Applying the max-min composition for computing RO) =AoB yields 


R(y) = max min{u a(x), ug (x, y)} 
= {(1, 1), (2, .6), (3, .5), (4, .2)} 


A possible interpretation of the inference may be the following: 


x is little 
x and y are approximately equal 


y is more or less little 


A direct application of approximate reasoning is the fuzzy algorithm (an 
ordered sequence of instructions in which some of the instructions may contain 
labels of fuzzy sets) and the fuzzy flow chart. We shall consider both in more 
detail in chapter 10. Here, however, we shall briefly describe fuzzy (formal) 
languages. 


9.4 Fuzzy Languages 


Fuzzy languages are formal languages based on fuzzy logic and approximate rea- 
soning. Several of them have been developed by now, such as LPL [Adamo 
1980], FLIP [Giles 1980], Fuzzy Planner [Kling 1973], and others. They are based 
on LP1, FORTRAN, LISP, and other programming languages and differ in their 
content as well as their aims. Here we shall sketch a meaning-representation lan- 
guage developed by Zadeh [Zadeh 198 1a]. 

PRUF (acronym for Possibilistic Relational Universal Fuzzy) is a meaning- 
representation language for natural languages and is based on possibility theory. 
PRUF may be employed as a language for the presentation of imprecise knowl- 
edge and as a means of making precise the fuzzy propositions expressed in a 
natural language. In essence, PRUF bears the same relationship to fuzzy logic 
that predicate calculus does to two-valued logic. Thus it serves to translate a set 
of premises expressed in natural language into expressions in PRUF to which the 
rules of inference of fuzzy logic or approximate reasoning may be applied. This 
yields other expressions in PRUF that can then be retranslated into natural lan- 
guage and become the conclusions inferred from the original premises. 

The main constituents of PRUF are 


1. a collection of translation rules, and 
2. a set of rules of inference. 
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The latter corresponds essentially to fuzzy logic and approximate reasoning, as 
described in sections 9.2 and 9.3. The former will be described in more detail 
after the kind of representation in PRUF has been described and some more def- 
initions introduced. 

In definition 8-2, the relational assignment equation was defined. In PRUF, a 
possibility distribution 7, is assigned via the 

possiblility assignment equation (PAE): n, = F 

to the fuzzy set F. The PAE corresponds to a proposition of the form “N is F” 
where N is the name of a variable, a fuzzy set, a proposition, or an object. For 
simplicity, the PAE will be written as in chapter 8 as 


~ 


T, = F 


Example 9-9 


Let N be the proposition “Peter is old”; then N (the variable) is called “Peter,” 
X e [0, 100] is the linguistic variable “Age,” “old” is, for instance, a term of the 
term set of “Age,” and 


Peter is Old — Wage(peter) = Old 


where — stands for “translates into.” 
There are two special types of possibility distributions that will be needed later. 


Definition 9-8 
The possibility distributions 7, with 
t(u)=1 for ueU 
is called the unity possibility distribution 1, and with 
ti(v)=v for ve[0,1] 


is defined the unitary possibility distribution function [Zadeh 1981a, p. 10]. 

In chapter 6 (definition 6—4), the projection of a binary fuzzy relation was 
defined. This definition holds not only for binary relations and numerical values 
of the related variables but also for linguistic variables. 

Different fuzzy relations in a product space U, x U, x... U, can have iden- 
tical projections on Uj, x... U, Given a fuzzy relation R, in U; x... Uj, 
there exists, however, a unique relation R that contains all other relations whose 
projection on U; x... x Uj, is Ry. Ru is then called the cylindrical extension of 
R,; the latter is the basis of R,, (see definitions 6—4, 6-5). 
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In PRUF, the operation “particularization” is also important: “By the particularization 
of a fuzzy en or a possibility distribution which is associated with a variable X= 
(X,,...,X,), is meant the effect of specification of the possibility distributions of one 
or more ' subvariables (terms) of X. Particularization in PRUF may be viewed as the 
result of forming the conjunction of a proposition of the form “X is F,” where X is an 
n-ary variable with particularizing propositions of the form “X, = G,” where X, is a sub- 
variable (term) of X and F and G, respectively, are fuzzy sets in U, X U, x... U„ and 
U, X...X Up respectively” [Zadeh 1981a, p. 13]. 


Definition 9-9 [Zadeh 1981a, p. 13] 


Let ñy = R(X, . . . X,) = F and Ty = 1(X;,..., Xi) = G be possibility distributions 
induced by the propositions “X is F” and “X, is G,” respectively. The particu- 
larization of tx by X, = G is denoted by (ix, = G) and is defined as the inter- 
section of F and G, that is, 


l(t, =G)=FNG’ 


where G’ is the cylindrical extension of G. 


Example 9-10 


Consider the proposition “Porsche is an attractive car,” where attractiveness of a 
car as a function of mileage and top speed is defined in the following table. 


Top speed Mileage 

Attractive cars (mph) (mpg) u 
60 30 4 

60 35 5 

60 40 6 

70 30 7 

85 25 7 

90 25 8 

95 25 9 

100 20 1.0 
110 15 1.0 


A particularizing proposition is “Porsche is a fast car,” in which “fast” is defined 
in the following table: 
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Top speed 
Fast cars (mph) 


60 
70 
85 
90 
95 
100 
110 


— 
COMO MANA ATE 


“Porsche is an attractive car” can equivalently be written as “Porsche is a fast 
car,” that is, “Top speed (Porsche) is high” and “mileage (Porsche) is high.” 

Using definition 9-9, the particularized relation attractive (Tspeea = Fast) can 
readily be computed, as shown in the next table: 


Attractive cars Top speed Mileage u 
60 30 A 
60 35 A 
60 40 A 
70 30 6 
85 25 | 
90 25 8 
95 25 9 
100 20 0.95 
110 15 1 


Translation Rules in PRUF. The following types of fuzzy expressions will be 
considered: 


1. Fuzzy propositions such as “All students are young,” “X is much larger than 
Y,” and “If Hans is healthy then Biggi is happy.” 

2. Fuzzy descriptors such as tall men, rich people, small integers, most, several, 
or few. 

3. Fuzzy questions. 


Fuzzy questions are reformulated in such a way that additional translation rules 
for questions are unnecessary. Questions such as “How A is B?” will be expressed 
in the form “B is ?A,” where B is the body of the question and “?A” indicates the 
form of an admissible answer, which can be a possibility distribution (indicated 
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as T); a truth value (indicated as T); a probability value (indicated as A); or a 
possibility value (indicated as @). 

The question “How tall is Paul?” to which a possibility distribution is expected 
as an answer, is phrased “Paul is ?r” (rather than “How tall is Paul ?7). “Is it 
true that Katrin is pretty?” would then be expressed as “Katrin is pretty ?t” and 
“Where is the car ?w” as “The car is ?w.” 

PRUF is an intentional language, that is, an expression in PRUF is supposed 
to convey the intended rather than the literal meaning of the corresponding 
expression in a natural language. Transformations of expressions are also intended 
to be meaning-preserving. Translation rules are applied singly or in combination 
to yield an expression, E, in PRUF that is a translation of a given expression, e, 
in a natural language. 

The most important basic categories of translation rules in PRUF are 


Type I Rules pertaining to modification 
Type II Rules pertaining to composition 
Type III Rules pertaining to quantification 
Type IV Rules pertaining to qualification 


Examples of propositions to which these rules apply are the following [Zadeh 
1981a, p. 29]: 


TypeI X is very small. 
X is much larger than Y. 
Eleanor was very upset. 
The man with the blond hair is very tall. 
Type II X is small and Y is large. (conjunctive composition) 
X is small or Y is large. (disjunctive composition) 
If X is small, then Y is large. (conditional composition) 
If X is small, then y is large else (conditional and conjunctive 
Y is very large. composition) 
Type III Most Swedes are tall. 
Many men are much taller than most men. 
Most tall men are very intelligent. 
Type IV Abe is young is not very true. (truth qualification) 
Abe is young is quite probable. (probability qualification) 
Abe is young is almost impossible. (possibility qualification) 


Rules of Type I 


Type I rules concern the modification of fuzzy sets representing propositions by 
means of hedges or modifiers (see definition 9-3). 
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If the proposition 
P2Nis F 


translates into the possibility assignment equation 


then the translation of the modified proposition 


P* =NismF is 


where F* is a modification of F by the modifier m. As mentioned in chapter 9.1, 
the modifier “very” is defined to be the squaring operation, “more or less” the 
dilation, and so on. 


Example 9-11 


Let p be the proposition “Hans is old,” where “old” may be the fuzzy set defined 
in example 9-1. The translation of p* = “Hans is very old,” assuming “very” to 
be modeled by squaring, would then be 


Tagetstansy = (Old) = {(u, Ha (w))lu € [0, 100]} 
where 


u € [0, 50] 
2 


0 
Hoia)? (u) = HES | u € (50, 100] 





Rules of Type II 
Rules of type II translate compound statements of the type 
p=q*r 


where * denotes a logical connective—for example, and (conjunction) or (dis- 
junction), if . . . then (implication), and so on. Here, essentially the definitions of 
connectives defined in section 9.1 and 9.2 are used in PRUF. 

If the statements q and r are 
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q=M _ is F > Tix a, x =F 
r=N is G— Ty a, y) =G 
then 
(M is F)and(N is G) > Tx, a, Xati., yy =FXG 
where 
F x G = {((u, v), u axa (u, Vu E U, ve V} 
and 


Laxg(u, v) = min{u r (u), yo (v)} 


“If M is F, then N is G’ > TUX)... Xa Yi... Yp) = F’ ® G where F? and G’ are the 
cylindrical extensions of F and G and @ is the bounded sum defined in defini- 
tion 3—9. Hence 

Ug oč, (u, v) = min{1, pa (u) + ue (v)} 


Example 9-12 [Zadeh 1981a, pp. 32-33] 
Assume that u = v = 1, 2, 3 and M £ X, N £ Y, and 


~ 


F = small = {(1, 1), (2, .6), (3, .1)} 
G = large = {(1, .1), (2, .6), 3, D} 


Then X is small and Y is large > 


mx, y) = {IC 1), .1], [G, 2), -6], (0, 3), 1], (2, 1), -1], [(2, 2), .6], 
[(2, 3), .6], IG, 1), 1], [G, 2), -1], 1G, 3), 1} 


X is small or Y is large > 


mx, y) = {[0, 1), 1], (0, 2), 1], IG, 3), 1], (2, 1), -6], (2, 2), .6], 
[(2, 3), 1], [G, i), 1], [G, 2), .6], [G, 3), .1]} 


If X is small, then Y is large > 


mx, y) = {[, 1), 1], (0, 2), .6], (C1, 3), 1], (2, 1), 5], (2, 2), 1], 
[(2, 3), 1], 1G, 1), 1], (G, 2), 1], IG, 3), 1} 


Translation rules of type II can, of course, also be applied to propositions con- 
taining linguistic variables. In some applications, it is convenient to represent 
fuzzy relations as tables (such as those shown in section 6.1). These tables can 
also be processed in PRUF. 
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Rules of Type III 
Type III translation rules pertain to the translation of propositions of the form 
PỌN ae F 


where N may also be a fuzzy set and Q is a so-called quantifier, for example, a 
term such as most, many, few, some, and so on. Examples are 


Most children are cheerful. 
Few lazy boys are successful. 
Some men are much richer than most men. 


A quantifier, Q, is in general a fuzzy set of which the universe is either the set of 
integers, the unit interval, or the real line. 

Some quantifiers, such as most, many, and so on, refer to propositions of sets 
that may either be crisp or fuzzy. In this case, the definition of a quantifier makes 
use of the cardinality or the relative cardinality, as defined in definition 2-5. 

In PRUF, the notation prop (FIG) is used to express the proportion of FinG 
where 

count( FN G) JEAN G| 


ro F G = ——— m = 
prop(F/G) count G IG 





where “count” corresponds to the above-mentioned cardinality. The quantifier 
“most” may then be a fuzzy set 


O = {[prop (F/G), mos (u, v)llu € F, v € G} 


Example 9-13 
The quantifier “several” could, for instance, be represented by 


O = several = {(3, .3), (4, .6), (5, 1), (6, .8), (7, -6), (8, .3)} 


Rules of Type IV 


In PRUF, the concept of truth serves to make statements about the relative truth of 

a proposition p with respect to another reference proposition (and not with respect 

to reality!). Truth is taken to be a linguistic variable, as defined in section 9.1. 

Truth is then interpreted as the consistency of proposition p with proposition q. If 
p=N is FOR, = F 


~ 


q=N is GnR,=G 
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then the consistency of p with q is given as 
cons{N is FIN is Gt=poss{N is FIN is G} 
= sup{min(p z(u), We (u))} 


ucU 


Example 9-14 


Let 
p= WN is a small integer 
q = N is not a small integer 
where 
small integer = {(0, 1), (1, 1), (2, .8), (3, .6), (4, .5), (5, .4), (6, .2)} 
Then 


cons{ p| q} = sup{[0, 0, .2, .4, .5, .4, .2]} 
= 5 


More in line with fuzzy set theory is the consideration of the truth of a propo- 
sition as a fuzzy number. Therefore Zadeh defines in the context of PRUF truth 
as follows: 


Definition 9-10 [Zadeh 198 1a, p. 42] 


Let p be a proposition of the form “N is F,” and let r be a reference proposition, 
r= N is G, where F and G are subsets of U. Then the truth, T, of p relative to r 
is defined as the compatibility of r with p, that is, 


t=Tr(N is FIN is G)=comp(N is G|N is P) 

= 2(G) 

= {(t, we(G))It € [0, 1] 
with 

u(G)= inf iuzlu), Ug(u)},ueU 
The rule for truth qualification in PRUF can now be stated as follows [Zadeh 
1981a, p. 44]: Let p be a proposition of the form 
p£N is F 


and let q be a truth-qualified version of p expressed as 
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g=N is F is 1 
where T is a linguistic truth value. g is semantically equivalent to the reference 
proposition, that is, 
N is Fis t3N is G 
where F , G, and T are related by 
t=pu;(G) 

In analogy to truth qualification, translation rules for probability qualification 

and possibility qualification have been developed in PRUF. 


Example 9-15 


Let 
U=N" = {0, 1, 2,...}, Ne No 
p=WN is small 
r=N is approximately 4 
where 


small = {(0, 1), (1, 1), (2, .8), (3, .6), (4, .4), (5, .2)} 
approximately 4 = {(1, .1), (2, .2), (3, .5), (4, 1), (5, .5), (6, .2), (7, .1)} 


Then 
t=Tr(N is small|N is approximately 4) 
=comp(N is approximately 4N is small) 


= {Usman (u), Ha (u)lu E€ U} 
= {(0, .2), (.2, .5), (4, 1), (6, .5), (.8, .2), Cl, . D} 


9.5 Support Logic Programming and Fril 
9.5.1 Introduction 


Fril is a logic programming style implementation of support logic programming 
[Baldwin 1986, 1987, 1993]. It is a complete programming system with an incre- 
mental compiler, on-line help, a step-by-step debugger, modular code develop- 
ment, and optimization [Baldwin, Martin, and Pilsworth 1995]. It is written in C 
and is a Prolog system if no uncertainties are used. The style of programming 
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can include the object-oriented paradigm by introducing the concept of a fuzzy 
object. A menu-driven window environment with dialogue boxes can be written 
in Fril to provide the intelligent systems application with a friendly front end. 
Fril can also be linked to Mathematica [Wolfram 1993], allowing mathematical 
equations to be solved as part of the inference process. Mathematical commands 
can be sent from Fril directly to Mathematica, and answers received by Fril can 
act as data for part of some inference process. 

Fril is an ideal language for soft computing, since it is an efficient general 
logic programming language with special structures to handle uncertainty and 
imprecision. Four types of rules are allowed in Fril: 


1. Prolog style rule 

2. Probabilistic fuzzy rule 
3. Causal relational rule 
4. Evidential logic rule 


The popularity and success of fuzzy control, which uses simple IF... THEN 
rules, should motivate knowledge engineers to investigate the use of Fril and 
fuzzy methods for intelligent systems. We would expect areas of application 
such as expert systems for large-scale engineering systems, vision-understanding 
systems, planning, robotics, military systems, medical and engineering diagnosis, 
economic planning, human interface systems, and data compression to benefit 
from this more general modeling approach. 

The fuzzy sets representing possible feature values and the importance given 
to these features can be automatically derived from a data set of examples. The 
rules derived in this way provide a generalization of the specific instances given 
in the data set. This, along with the Fril inference rules, provides a theory of 
generalization and decision suitable for machine intelligence. 


9.5.2 Fril Rules 


The three Fril rules are of the form: 
<head> IF <body> : <list of support pairs> 


where the head of the rule can contain a fuzzy set. In the case of rules of types 
II and III, the body of the rule can be a conjunction of terms, a disjunction of 
terms, or a mixture of the two, and each term can contain a fuzzy set. The body 
of the fourth rule is a list of weighted features, where a feature is simply a con- 
dition that may contain a fuzzy set or the head of another rule. The list of support 
pairs provides intervals containing conditional probabilities of some instantiation 
of the head given some instantiation of the body. 

An example of each type of rule is as follows: 
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Example 9-16: Rule of Type II 


((suitability place X for sports stadium Y is high) 
(access X from other parts of city is easy) (cost_to_build Y at X is fairly 
cheap)): [0.9, 1] 


This rule states that there is a high probability that any place X is highly suitable 
to build a sports stadium Y if X is easily accessed and Y can be built fairly cheaply 
at X. 


Example 9-17: Rules of Type III 


((shoe_size man X is large) 
((height X is tall) (height X is average) (height X is small)): [0.8, 1] 
[0.5, 0.6] [0, 0.1] 


This rule states that the probability of a tall man wearing large shoes is greater 
than 0.8. The probability that a man of average height wears large shoes is 
between 0.5 and 0.6. The probability that a small man wears large shoes is less 
than 0.1. 

We can think of the rule as representing the relationship between two vari- 
ables, S and H, where S is shoe size and H is height of man. S is instantiated 
to large, while H has three instantiations in the body of the rule. The rule 
expresses Pr(S is large|H is h;) where h; is a particular fuzzy instantiation of H. 
This type of rule is useful to represent fuzzy causal nets and many other types of 
applications. 


Example 9-18: Rules of Type IV 


((suitability_as_secretary person X is good) 
(evlog most((readability handwriting of X, high) 0.1 
(neatness(X, fairly good)) 0.1 
(qualifications X, applicable) 0.2 
(concentration X, long) 0.1 
(typing skills X, very good) 0.3 
(shorthand X, adequate) 0.2))): [1, 1] [0, 0] 


This rule says that a person’s suitability as a secretary is good if most of the 
weighted features in the body of the rule are satisfied. The term “most” is a fuzzy 
set that is chosen to provide optimism for those persons who satisfy the criteria 
well and pessimism for those who satisfy the criteria badly. Type III rules are 
evidential logic rules and can be used for vision understanding, classification, 
and case-based reasoning. The satisfaction of features such as (qualifications X 
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applicable) is determined from another rule with satisfaction as head. Methods 
can be used to determine near optimal weights and the fuzzy sets in the body of 
the rules from a data set of examples [Baldwin 1994]. These are discussed below. 


Meta Rules 


Types III and IV rules can be written in terms of types I and II rules. Other rules, 
which we can Call meta rules, can be similarly defined in Fril. 


9.5.3 Inference Methods in Fril 


Consider a statement such as 
most tall persons wear large shoes 


The words printed in italics are fuzzy sets representing the vagueness of the 
definitions of these concepts. 
This sentence can be replaced by the equivalent statement 


Pr(a person X wears large shoes|X is tall) 2 0.95 


if we interpret “most” as the fuzzy set “greater_than_95%.” We can simplify 
further if we replace the fuzzy set “greater_than_95%” with the support pair 
[0.95, 1], where a support pair is an interval containing a probability. 

This could be written as a Fril rule: 


((shoe_size of X large) 
(height of X tall)): [0.95, 1] 


The discrete fuzzy set large defined on the size domain and the continuous 
fuzzy set tall defined on the height are represented as list structures in Fril. For 
example, 


set (height_domain (4 8)) 

set (size_domain (4567 8 9 10 11 12 13) 

(tall [5.8: 0, 6: 1] height_domain) 

(large {9: 0.3, 10: 0.5, 11: 0.9, 12: 1, 13: 1} size_domain) 


The height domain is all heights in the range [4ft, 8ft], and the size domain 
is the list of shoe sizes {4 5 6 7 8 9 10 11 12 13}. The membership of elements 
in the discrete fuzzy set are given to the right of the colon. For the continuous 
fuzzy set, the membership is O for all heights in the height domain smaller 
than 5.8 and 1 for all heights in the height domain larger than 6, and linear inter- 
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polation is used to determine the membership value for heights in the range 
[5.8, 6]. 
Assume we know the facts 


((height of John average)) 
where the fuzzy set average is defined using the Fril statement 
(average [5.8: 0, 5.9: 1, 6: 0] height_domain) 


Then we should be able to conclude something like shoe_size of John is 
more_or_less_fairly_large. We would like to be able to provide an estimate from 
the fuzzy set conclusion for the size of John’s shoes. This corresponds to defuzzi- 
fying the fuzzy set conclusion. We would only defuzzify if asked for a precise 
value. 


How can we determine the fuzzy set f for the conclusion 
((shoe_size of X f)) 

and how can we defuzzify this conclusion to give us the conclusion 
((shoe_size of John s)) 

corresponding to defuzzified value s? 


The term in the body of the rule (height of X tall) is matched to (height of John 
tall) with X instantiated to John. There is only a partial match because average 
only partially matches the term “tall.” The mass assignment theory allows us to 
determine an interval containing the conditional probability 


Pr{(height of John tall)|(height of John average) } 


This interval can be denoted by [x;, x2]. The process of determining this interval 
is called interval semantic unification. Fril automatically determines this interval. 
There is also a point-version semantic unification in which a point value is deter- 
mined by intelligent filling in for unknown information. A query can be asked in 
Fril such that point semantic unification is used. In this case, Fril returns 


Pr{(height of John tall)|(height of John average)} = x 


We now know that the body of the rule is satisfied with a belief or probability given 
by the support pair [x,, x2] or point value x. x, gives the necessary support for the 
body of the rule, and x, gives the possible support for the body of the rule. 1 — x, 
gives the necessary support against the body of the rule being satisfied. We can 
now use an interval version of Jeffrey’s rule of inference to determine a support 
pair for the consequence of the rule [Baldwin 1991]. Jeffrey’s rule is of the form 


Pr’(h) =) Pr(hlb;)Pr’(G;) 


i=l 
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where {Pr(h|b;} represent conditional probabilities determined from a population 
of objects and {P’r(b,)} are probabilities or beliefs about a given object from the 
population. These primed probabilities are not determined with reference to the 
population of objects. The primed probabilities are specific to the one object under 
investigation. To make this more clear, consider the following example. From past 
observations and examination results, it is known that in a given school 90% of 
hardworking students obtain good passes in their final examinations. The proba- 
bility Pr(good pass|hardworking) is obtained from population considerations. 
Consider a new boy to the school. By interviewing the body and from references, 
we estimate a belief that this boy will be hardworking, say, 0.7. The probability 
P’r(new boy hardworking) = 0.7 is specific to the new boy and is not related to the 
Pr(hardworking), which would be the proportion of boys in the school who are 
hardworking. Jeffrey’s rule is similar to the theorem of total probabilities but with 
a mixture of population-estimated probabilities and specific beliefs. 
In terms of the above example, Jeffrey’s rule is 


Pr{(shoe_size of John large)} = 
Pr{(shoe_size of John large)|(height of John tall)}Pr{(height of John tall)} 
+ Pr{(shoe_size of John large)|—(height of John tall)}Pr{ (height of John tall)} 


We know 
Pr{ (height of John tall)} is contained in the interval [x,, x2]. 
From this we can deduce 


Pr{(shoe_size of John large)} is contained in the interval [y, 1] 
where y = 0.95x, 


since we know 

Pr{(shoe_size of John large)|(height of John tall)} e [0.95, 1] 
and 

Pr{(shoe_size of John large)|—(height of John tall)} €e [0, 1]. 


We must now convert this to a statement containing only a fuzzy set but no 
probabilities. 
From the basic concept of a support pair, we can state 


Pr{(shoe_size of John large)} = y 
Pr{(shoe_size of John -large)} = 1-1=0 
Pr{(shoe_size of John any_possible_size)} = 1 — y 


We use these three conclusions to determine a membership function for the fuzzy 
set f in the statement 
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(shoe_size of John f) 
by calculating f as the expected fuzzy set. Thus 
LW (Ss) = XPiarge(s) + (L— x) for alls 


We can defuzzify this fuzzy set, as described later. Briefly, we use the fuzzy set 
f to determine a least prejudiced probability distribution over the shoe_size 
domain and choose the size with the highest probability. If the domain for 
shoe_size had been a continuous domain, then we would defuzzify by choosing 
the mean of the distribution. 

If point semantic unification is used rather than the interval semantic unifica- 
tion, then Fril would give the above solution but with y = 0.95x. 


9.5.4 Fril Inference for a Single Rule 


Consider the inference for a single Fril rule of the form 
(Ab)... (Bn): Car vi Un Vn) 
when the following facts are given: 
((;)): (œ; B;); alli 


More generally, the facts will not completely match the terms in the rule and the 
support pair (a; B,); and i will be determined using semantic unification. A gen- 
eralized Jeffrey’s rule for support pairs is the basic inference rule of Fril, as dis- 
cussed above, so that h: (zı z2) where 


Z = min) u;0; where O; < 0; < B; 
and 5o; =| 
Z = max) v,0; where O; < 0; < B; 


and 50; =] 


These are trivial optimization problems. 

Each b; can be a conjunction of terms, a disjunction, and a mixture of the two. 
A calculus based on probability theory is used to compute the support pair for 
any b; with respect to the support pairs of its individual terms. 

The inference rule for the basic rule is a special case of this, since the basic 
rule is equivalent to 
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(AKC): (a vi 2 v2)) 


For the evidential logic rule of the form 


((h)))(evlog f 
(cı Wi)... (Cn Wn))) 
: (xı yı (x2 y2)) 


with facts 
((c;)): (a; Bi) 
the support pair given to the body of the rule is 


(x vice wp) 


The basic inference rule is then used to give the final support pair for the head 
(h). 

The point semantic unification case is only a special case of this where the 
supports (&;, B;) are replaced with point values. 


9.5.5 Multiple Rule Case 


More generally, Fril can use several rules with the same head predicate to deter- 
mine a given inference. Consider, for example, the fuzzy logic rules 


((y value is fi) (x, value is g1) (x2 value is h,)) 
((y value is f) (x, value is g3) (x2 value is h,)) 
((y value is f,) (x, value is g,) (x2 value is h,)) 


for determining the value of y given values for x, and x. {f;}, {g;}, and {h;} are 
fuzzy sets defined on the domains for y, xı, and x2, respectively. If we provide 
the facts, 


((x; is about_a)) 
((x2 is about_b)) 


where about_a is a fuzzy set defined on the domain for x, and about_b a fuzzy 
set defined on the domain for x,. Then Fril uses each rule to obtain 


(y value is fi): (x1 y1) 
(y value is f2): (x2 yo) 
(y value is fn): (Xa Yn) 
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Fril then determines 


(y value is fz) 
(y value is f3) 
(y value is fm) 


where fa is an expected fuzzy set determined as described previously. These are 
intersected to give the final solution 


(y value is f,) 


where fa = fa O fa... O fan and N is fuzzy intersection. 
For multiple rules with the same head where the heads do not contain fuzzy 
sets, then the support pairs are intersected. 


9.5.6 Interval and Point Semantic Unification 


We will first explain the concepts involved in the Fril semantic unification using 
a simple example. This explanation will be in terms of discrete fuzzy sets. Fril 
handles both discrete and continuous fuzzy sets, and the algorithm is optimized 
for computational efficiency. 

Consider the Fril program: 


set (dice_dom (1, 2, 3, 4, 5, 6)) 

(small {1:1, 2:1, 3:0.3} dice_dom) 
(about_2 {1:0.3, 2:1, 3:0.3} dice_dom) 
((dice shows small)) 


If we ask the query 
qgs((dice shows about_2)) 

which asks for the support that the dice shows about_2, then Fril returns 

((dice shows about_2)): (0.3 1) 
The point semantic query 

qs_p((dice shows about_2)) 

returns 

((dice shows about_2)): 0.615 


In other words, Fril calculates Pr{(dice shows about_2)|(dice shows small)} € 
[0.3, 1] for interval semantic unification and Pr{(dice shows about_2) | (dice 
shows small)} = 0.615 for point semantic unification. How is this done? 
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The fuzzy sets small and about_2 can be written as mass assignments [Baldwin 
1992], namely, 


Mmali = {1, 2}: 0.7, {1, 2, 3}: 0.3 
Mabout_2 = {2}: 0.7, {1, 2, 3}: 0.3 


where a mass assignment is equivalent in this case to a Dempster/Shafer basic 
probability assignment. We can depict these graphically as in the table below. The 
given information is depicted at the top of the table. In each cell we can denote 
the truth of the left-hand set given the top set. This truth value will be ¢, f, or u, 
representing true, false, or uncertain, respectively. For example, the truth of {2} 
given {1, 2} is uncertain since if the dice shows 1, then {2} will be false, while 
if it shows 2, then {2} will be true. What mass should we associate with each of 
the cells? Baldwin’s theory of semantic unification states that the masses in the 
cells should satisfy the following row and column constraints: The column cell 
masses should sum to the column mass, and the row cell masses should sum to 
the corresponding row mass. 


0.7 0.3 
{1, 2} {1, 2, 3} 





Thus 
mll +m12 = 0.7 
m21 + m22 = 0.3 
mll + m21 = 0.7 
m12 + m22 = 0.3 


This will not provide a unique solution. One solution is to multiply the column 
and row masses to obtain the corresponding cell mass. This procedure can be 
thought of as assuming independence of the mass assignment in the Fril program 
and of that given in the query. Fril uses this multiplication model, giving 


mll = 0.49, m12 = 0.21, m21 = 0.21, and m22 = 0.09. 


Thus we have the truth mass assignment 
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t: 0.3, {t, f}: 0.7 


so that the support for Pr(about_2|small) = [0.3, 1]. 

A point semantic solution is obtained in the same way, but m11 and m12 are 
modified to give their contributions to true, assuming an equally likely probabil- 
ity distribution for dice values for the given information. Therefore we modify 
m11 to 0.5m11 and m12 to (1/3)m12, since {2} is true if 1 of {1, 2} is given and 
false otherwise, and {2} is true if 1 of {1, 2, 3} is true and false otherwise. This 
provides the modified table below: 


0.7 0.3 
(1, 2} {1, 2, 3} 
0.7 
{2} 
0.3 
{1, 2, 3} 





0.615 


If there are cells with an fentry, then the upper support for interval semantic uni- 
fication will be less than 1. 

The point semantic unification satisfies the normalization condition and the 
Dubois/Prade consistency condition, i.e., 


Pr(flg)+ Pr(f.lg) =1 
Pr(Alg) < TI(Alg) 


where f and g are fuzzy sets defined on the same domain, f, is the complement 
of f, A is any subset of the domain, and I is Zadeh’s possibility measure. The 
multiplication model arises from relative entropy considerations discussed by 
Baldwin [1991], as does the use of Jeffrey’s rule for inference. It should be noted 
that if the prior on the domain elements is different to equally likely distribution, 
then this will be taken into account when the point semantic unification is per- 
formed. Suppose in the above dice example it was known that the dice was 
weighted and had the prior {1: 1/9, 2:2/9, 3: 1/9, 4:2/9, 5: 1/9, 6:2/9}; then 


Pr(about_2|small) = (0.49)2/3 + (0.07)1/2 + 0.3 = 0.6617 
9.5.7 Least Prejudiced Distribution and Learning 


The fuzzy sets occurring in the various Fril rules can be determined automati- 
cally from a database of examples. For example, suppose we have a database of 
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values of y = F(x) for a range of values of x and we want to approximate the 
function using the fuzzy logic rules 


((y has value in f)(x has value in g;)) fori=1,...,n 


where {f;} and {g;} are fuzzy sets defined on the X and Y domains, respectively. 
Suppose further that we choose the {f;} to be triangular fuzzy sets on the Y 
domain. How should we choose { g;} to provide a good approximation to the func- 
tion? The inference method for a given input for X is that described in sections 
4 and 5. Defuzzification using the mean of the least prejudiced distribution is 
used as the estimate for F(x). 

In this section, we will define what is meant by the least prejudiced distri- 
bution, outline the method used to determine the fuzzy sets {g;}, and indicate 
how this can be extended to the case of the evidential logic rule. The theory is 
described by Baldwin [1994]. 

Consider a discrete fuzzy set small for the dice problem above. The statement 
(dice score is small) provides a possibility distribution over the dice domain 
where T(i) = Usmai(i), i= 1,..., 6. 

According to Baldwin’s theory of mass assignments, this is equivalent to a 
family of probability distributions given by the mass assignment 


Mena = {1, 2}: 0.7, {1, 2, 3}: 0.3 


The mass 0.7 can be distributed among the elements 1 and 2 in any way and the 
mass 0.3 among 1, 2, 3 in any way. This gives the family of probability distrib- 
utions. The least prejudiced distribution is the one given by allocating a mass 
equally among the elements with which it is associated. Thus the least prejudiced 
distribution for the fuzzy set small is 


Ipdynan = 1:0.35 + 0.1, 2:0.35 + 0.1, 3:0.1 
giving 
Ipdyman = 1:0.45, 2:0.45, 3:0.1 


Fril extends this to the continuous case and provides a least prejudiced distribu- 
tion for any fuzzy set. 

Defuzzification instantiates the value to the mean value of this least prejudiced 
distribution. 

Suppose we have a frequency distribution f(x) for values of the attribute X 
determined from a set of examples. Fril determines the appropriate fuzzy set for 
F by ensuring that the least prejudiced distribution for this fuzzy set is f. If the 
classification is fuzzy, as in the above rules for function approximation, then Fril 
takes into account the fact that for some examples the classification will have a 
membership in several rule heads. 
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If we have a set of examples and for each example we are provided with 
attribute values for attributes F,,..., F, and a given classification (c, say), we 
can use the above method to derive the fuzzy sets occurring as feature values in 
the evidential logic rule. Fril can also determine near optimal weights for the rule 
using a specialized discrimination algorithm. 

This approach has been used for function approximation; several 
classification-type problems, such as handwriting character recognition and 
underwater sound recognition from acoustic spectra; and deriving fuzzy control 
rules. The method is an alternative approach to neural supervised learning and 
can be used for similar types of problems. 


9.5.8 Applications of Fril 


The Fril language is an uncertainty logic programming system that can be used 
for fuzzy control, evidential logic reasoning, causal reasoning, classification, and 
other AI applications that require reasoning with missing information, vague 
information, or uncertain information. 

It can be used to build expert systems, decision support systems, vision under- 
standing systems, fuzzy databases, and other AI knowledge engineering applica- 
tions [Baldwin and Martin 1993]. 

For example, Fril has been used to implement an intelligent data browser. 
A window-environment front end is provided that allows the user to enter a 
database or link to an existing database in Oracle, input rules, and ask any 
relevant queries concerning the database. The required evidential logic and 
other rules required to answer a particular query will automatically be con- 
structed. The user can ask for an explanation and can investigate the sensitiv- 
ity of any new rules formed. Queries can be asked about any attribute of the 
database when given information concerning other attributes of the database. 
The given information need not be precise and can be in the form of fuzzy sets 
or intervals or sets of values. The user can contribute to the establishment of 
the required rules in various ways—for example, choosing the type of rule, the 
features in the body of a rule, the weights in an evidential logic rule, or the 
fuzzy sets in a rule. These decisions can be made by the intelligent browser 
automatically, but the user can then make any changes if required. Rules 
formed are retained for future use. When appropriate, the accuracy of a new 
rule can be tested by using the database as test cases for which the answers 
are known. 

This type of module has many applications from scientific, engineering, finan- 
cial, and business fields. The system can be used to provide a summary of large 
amounts of data, interpolate between database instances, provide approximate 
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reasoning, derive classifiers, perform case-based reasoning, derive causal nets, 
derive probabilistic fuzzy rules, and derive fuzzy controllers. 

In the case of classification, for example, the classification could be the suit- 
ability of a house for a given customer and the features would be the various 
qualities of the house such as size of garden, number of bedrooms, size of lounge, 
etc. A representative number of examples of suitable houses would be chosen 
by the customer. A new house on the market could then be tested to see for which 
customers it would be suitable. The database could be the classification of 
creditworthiness of persons. The classification of creditworthiness could be 
{very_good, good, average, poor, very_poor}. The database would consist of past 
customers with their details as features and subjective creditworthiness 
estimated. Another example might be a classification of change in interest 
rate with features representing economic measurable conditions. Classes of 
{very_good, good, average, poor, very_poor} for the potential for oil at a given 
place with geological measurement and other features is another obvious 
example. 

Fril has been successfully used to build an expert system for designing air- 
craft structures using composite materials. This expert system calls various analy- 
sis programs in different languages to help with the design and evaluation. Fril 
has also been used for command and control studies, a dental expert system for 
planning orthodontic treatment, design of a client administration expert system, 
to produce a modeling tool for representing the behavior of aircrew in aircrew 
and fixe wing operations, to build an intelligent manual for safety studies in the 
disposal of nuclear waste, software dependability studies, and conceptual graph 
implementation. 


Exercises 


1. Consider the linguistic variable “Age.” Let the term “old” be defined by 
0 if xe[0, 40] 


Low () = (1 4( =) if xe€(40,100] 





Determine the membership functions of the terms “very old,” “not very old,” 
“more or less old.” 

2. Let the term “true” of the linguistic variable “Truth” be characterized by the 
membership function 
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0 if vsa 
2 
{2%} if a<v<P 
T(v;0.,8,y=4 T 3 
1-7 =") if B<v<y 
1 if vsy 





Draw the membership function of “true.” Determine the membership func- 
tions of “rather true” and “very true.” What is the membership function of 
“false” = not “true” and what of “very false”? 

3. What is the essential difference between Baldwin’s definition of “true” and 
Zadeh’s definition? 

4. Let the primary terms “young” and “old” be defined as in example 9-3. 
Determine the secondary terms “young and old,” “very young,” and “not 
very old.” 

5. Let “true” and “false” be defined as in example 9—4. Find the membership 
function of “very very true.” Compare the fuzzy sets “false” and “not true.” 

6. Let the universe X = {1, 2, 3, 4, 5} and “small integers” be defined as A= 
{(1, 1), (2, .5), (3, .4), (4, .2)}. Let the fuzzy relation “almost equal” be 
defined as follows: 





What is the membership function of the fuzzy set B = “rather small integers” 
if it is interpreted as the composition A o R? 

7. What is the relationship between a relational assignment equation and a pos- 
sibility assignment equation? 

8. Which of the definitions of “true” amounts to unity possibility distributions 
and which other important linguistic variables are represented by unity pos- 
sibility distribution? 

9. Consider examples 9-10 and make propositions about cars like Mercedes, 
Volvo, Chevy, and Rolls Royce. 


1 O FUZZY SETS AND 
EXPERT SYSTEMS 


10.1 Introduction to Expert Systems 


During the last three decades, the potential of electronic data processing (EDP) 
has been used to an increasing degree to support human decision making in dif- 
ferent ways. In the 1960s, the management information systems (MISs) created 
probably exaggerated hopes for managers. Since the late 1970s and early 1980s, 
decision support systems (DSSs) found their way into management and 
engineering. The youngest offspring of these developments are the so-called 
knowledge-based expert systems or short expert systems, which have been 
applied since the mid-1980s to solve management problems [Zimmermann 1987, 
p. 310]. It is generally assumed that expert systems will increasingly influence 
decision-making processes in business in the future. 

If one interprets decisions rather generally, that is, including evaluation, 
diagnosis, prediction, etc., then all three types could be classified as deci- 
sion support systems that differ gradually with respect to the following 
properties: 


1. Does the system “optimize” or just provide information? 
2. Is it usable generally or just for specific purposes and areas? 
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3. Is it self-contained with respect to procedures and algorithms, or does it 
“learn” and “derive” inference and decision-making rules from knowledge 
that is inquired from a human (expert) and analyzed within the system? 


It can be expected that in the future these decision support systems will contain 
to an increasing degree features of all three types of the above-mentioned 
systems. Even though fuzzy set theory can be used in all three “prototypes,” we 
shall concentrate on “expert systems” only because the need and problem of 
managing uncertainty of many kinds is most apparent there; and hence the appli- 
cation of fuzzy set theory is most promising and, in fact, most advanced. In oper- 
ations research (OR), the modeling of problems is normally being done by the 
OR specialist. The user then provides input data, and the mathematical model 
provides the solution to the problem by means of algorithms selected by the OR 
specialist. 

In expert systems, the domain knowledge is typically emphasized over formal 
reasoning methods: 


In attempting to match the performance of human experts, the key to solving the 
problem often lies more in specific knowledge of how to use the relevant facts than in 
generating a solution from some general logical principles. “Human experts achieve 
outstanding performance because they are knowledgeable” [Kastner and Hong 1984]. 


Conventional software engineering is based on procedural programming lan- 
guages. The tasks to be programmed have to be well understood, the global flow 
of the procedure has to be determined, and the algorithmic details of each subtask 
have to be known before actual programming may proceed. Debugging often 
represents a huge investment of time, and there is little hope of automatically 
explaining how the results are derived. Later modification or improvement of a 
program becomes very difficult. 


Most of the human activities concerning planning, designing, analyzing, or consulting 
have not been considered practical for being programmed in conventional software. 
Such tasks require processing of symbols and meanings rather than numbers. But more 
importantly, it is extremely difficult to describe such tasks as a step-by-step process. 
When asked, an expert usually cannot procedurally describe the entire process of 
problem solving. However, an expert can state a general number of pieces of knowl- 
edge, without a coherent global sequence, under persistent and trained interrogation. 
Early AI research concentrated on how one processes relevant relations that hold true 
in a specific domain to solve a given problem. Important foundations have been devel- 
oped that enable, in principle, any and all logical consequences to be generated from 
a given set of declared facts. Such general purpose problem solving techniques, 
however, usually become impractical as the toy world used for demonstration is 
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replaced by even a simple real one. The realization that knowledge of how to solve 
problems in the specific domain should be a part of the basis from which inferences 
are drawn contributed heavily to making expert systems technology practical [Kastner 
and Hong 1984]. 


While the typical OR model or software package normally supports the expert, 
an expert system is supposed to model an expert and make his or her expert 
knowledge available to nonexperts for purposes of decision making, consulting, 
diagnosis, learning, or research. 

The character of an expert system might become more apparent if we quote 
some of the system characteristics considered to be attributes of expert systems 
[Konopasek and Jayaraman 1984]. Attributes of expert systems include: 


The expert system has separate domain-specific knowledge and problem-solving 
methodology and includes the concepts of the knowledge base and the inference 
engine. 


The expert system should think the way the human expert does. 


Its dynamic knowledge base should be expandable and modifiable and should 
facilitate “plugging in” different knowledge modules. 


The interactive knowledge transfer should minimize the time needed to transfer 
the expert’s knowledge to the knowledge base. 


The expert system should interact with the language “natural” to the domain 
expert; it should allow the user to think in problem-oriented terms. The system 
should adapt to the user and not the other way around. The user should be insu- 
lated from the details of the implementation. 


The principal bottleneck in the transfer of expertise—the knowledge engineer— 
should be eliminated. 


The control strategy should be simple and user-transparent; the user should be 
able to understand and predict the effect of adding new items to the knowledge 
base. At the same time, the strategy should be powerful enough to solve complex 
problems. 


There should be an inexpensive framework for building and experimenting with 
expert systems. 


The expert system should be able to reason under conditions of uncertainty and 
insufficient information and should be capable of probabilistic reasoning. 


An expert system should be able to explain “why” a fact is needed to complete 
the line of reasoning and “how” a conclusion was arrived at. 
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Expert systems should be capable of learning from experience. 


Cutting a long story short, Kastner and Hong [1984] provide this definition: 


An expert system is a computer program that solves problems that heretofore required 
significant human expertise by using explicitly represented domain knowledge and 
computational decision procedures [Kastner and Hong 1984]. 


A sample of some other definitions of an expert system can be found in the work 
of Fordyce et al. [1989, p. 66]. The general structure of an expert system is shown 
in figure 10-1 (see also Zimmermann [1987, p. 262]). In the following, the five 
components of such a system are explained in more detail. The knowledge acqui- 
sition module supports the building of an expert system’s knowledge base. 


The subject of knowledge acquisition for knowledge-based systems falls conveniently 
into two parts depending on whether the knowledge is elicited from the experts by 
knowledge engineers or whether that knowledge 1s acquired automatically by the com- 
puter using some form of automatic learning strategy and algorithms [Graham and 
Jones 1988, p. 279]. 


A module that aids the knowledge engineer during the process of knowledge 
elicitation could consist of a user-friendly rule editor, an “automatic error- 
checking when rules are being put in, and good online help facilities” [Ford 1987, 
p. 162]. (See also Buchanan et al. [1983, p. 129]). AQUINAS is such a system; 
it is presented by Boose [1989, p. 7]. 

Another way to acquire domain-dependent knowledge is the application of 
machine learning techniques to automatically generate a part of the knowledge 
base. It is expected that rapid improvements will take place in the field of auto- 
matic knowledge acquisition in the future. The interested reader is referred to 
Michalski et al. [1986, p. 3] and Morik [1989, p. 107]. 

The knowledge base contains all the knowledge about a certain domain that 
has been entered via the above-mentioned knowledge acquisition module. Apart 
from special storage requirements and system-dependent structures, the knowl- 
edge base can be exchanged in some expert systems. That means that there can 
be several knowledge bases, each covering a different domain, which can be 
“plugged into” the “shell” of the remaining expert system. 


There are basically two types of knowledge that will need to be represented in the 
system; declarative knowledge and procedural knowledge. The declarative part of the 
knowledge base describes “what” the objects (facts, terms, concepts, . . .) are that are 
used by the expert (and the expert system). It also describes the relationships between 
these objects. This part of the knowledge base is sometimes referred to as the “data 
base” or “facts base.” 
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Figure 10-1. Structure of an expert system. 


The procedural part of the knowledge base contains information on how these 
objects can be used to infer new conclusions and ultimately arrive at a solution. Since 
this “how-to” knowledge is usually expressed as (heuristic or other) rules, it is gener- 
ally known as the rule-base [Rijckaert et al. 1988, p. 493]. 


A number of techniques for representing the expert knowledge have been 
developed. These are described by Barr and Feigenbaum [1981/82] in greater 
detail. The four methods most frequently used in expert systems are production 
rules, semantic nets, frames, and predicate calculus (see Zimmermann [1987, 
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p. 266]). While we will investigate here the first three of these, the reader is 
referred to Nilsson [1980, p. 132] for the latter. 


Production Rules. Production rules are by far the most frequently used method 
for representing procedural knowledge in an expert system. They are usually of 
the form: “If a set of conditions is satisfied, then a set of consequences can be 
produced.” 


Production rules are used to capture the expert’s rule of thumb or heuristic as well as 
useful relations among the facts in the domain. These if-then rules provide the bulk of 
the domain-dependent knowledge in rule-based expert systems and a separate control 
strategy is used to manipulate the rules. 


If the car won’t start and 
the car lights are dim 
then the battery may be dead. 


Many experts have found rules a convenient way to express their domain knowl- 
edge. Also, rule bases are easily augmented by simple adding more rules. The ability 
to incrementally develop an expert system’s expertise is a major advantage of rule- 
based schemes [Kastner and Hong 1984]. 


Semantic Nets. One method of encoding declarative knowledge is a semantic 
net. Concepts, categories, or phenomena are presented by a number of nodes asso- 
ciated with one another by links (edges). These links may represent causation, 
similarity, propositional assertions, and the like. On the basis of these networks, 
insight into structures can be gained, inferences can be made, and classifications 
can be obtained. In figure 10—2, a semantic net is used to represent declarative 
knowledge about the structure of some vehicles. 


Frames. The concept of a frame for representing knowledge in an expert 
system is introduced by Minsky [1975]: “A frame is a structure that collects 
together knowledge about a particular concept and provides expectations and 
default knowledge about that concept.” Typically, the frame is represented in the 
computer as a group of slots and associated values. The values may themselves 
be other frames. 


Frame: vehicle 
classes passenger, motorcycle, truck, bus, bicycle, ... 
wheels (integer) 


propelled by motor, human feet... . 
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has-part 
Car Chassis 


is-a has-part 
is-a 
has-part 
has-part is-a is-a has-part 


Figure 10-2. Semantic net. 


Frame: bicycle 

is-a vehicle 

wheels 2 (default) 
capacity 1 person (default) 


[Kastner and Hong 1984] 


New concepts can often be represented by adding frames or by putting new 
information in “slots” of existing frames. Slots in frames may also be used for 
inference rules and empty slots might indicate missing information. 


The inclusion of procedures in frames joins together in a single representational strat- 
egy two complementary (and, historically, competing) ways to state and store facts: 
procedural and declarative representations [Harmon and King 1985, p. 44]. 


The inference engine 1s a mechanism for manipulating the encoded knowledge 
from the knowledge base and to form inferences and draw conclusions. The 
conclusions can be deduced in a number of ways that depend on the structure of 
the engine and the method used to represent the knowledge. In the case of 
production rules for knowledge encoding, different control strategies have been 
used that direct input and output and select which rules to evaluate. Two very 
popular strategies are “forward chaining” and “backward chaining.” In the 
former, data-driven rules are evaluated for which the conditional parts are satis- 
fied. The latter strategy (goal-driven) selects a special rule for evaluation. The 
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Table 10-1. Expert systems. 

Name Domain of expertise Major technique 
CADIAG-2 internal medicine rules* 
[Adlassnig et al. 1985] 

DENDRAL molecular structure rules 
[Lindsay et al. 1980] elucidation 

EMERGE chest pain analysis rules* 
[Hudson and Cohen 1988] 

ESP strategic planning rules* 
[Zimmermann 1989] 

EXPERT rheumatology, rules* 

[Weiss and Kulikowski 1981] ophthalmology hierarchies 
FAULT financial accounting rules* 
[Whalen et al. 1987] 

MYCIN infectious disease rules 
[Buchanan and Shortliffe 1984] diagnosis and treatment 

OPAL job shop scheduling rules* 


[Bensana et al. 1988] 


PROSPECTOR mineral exploration inference network 
[Benson 1986] 

R1/XCON computer configuration rules 

[McDermott 1982] 

SPERIL earthquake engineering rules* 


[Ishizuka et al. 1982] 


* Includes fuzzy logic. 


“goal” is to satisfy the conditional part of this rule. If this cannot be achieved 
directly, then subgoals are established on the basis of which a chain of rules can 
be established such that eventually the conditional part of the first rule can be 
satisfied. Further information about inference strategies has been described by 
Waterman [1986]. 

The above-mentioned approaches can, of course, be combined. In addition to 
these techniques, expert systems may also contain rather sophisticated mathe- 
matical algorithms, such as cluster algorithms and optimization and search 
techniques like tabu search (see Glover and Greenberg [1989, p. 119]). This 
development is actually already in the direction of decision support systems, but 
in many cases it will make the expert system more efficient and even more user- 
friendly. Table 10—1 gives some indication in which area expert systems are 
already available and what techniques they use. By no means does this table claim 
to be exhaustive. 
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10.2 Uncertainty Modeling in Expert Systems 
There are three main reasons for the use fuzzy set theory in expert systems: 


1. The interfaces of the expert system on the expert side as well as on the user 
side are with human beings. Therefore communication in a “natural” way 
seems to be the most appropriate; and “natural” means, generally, in the lan- 
guage of the expert or user. This suggests the use of linguistic variables as 
they were described in chapter 9. 

2. The knowledge base of an expert system is a repository of human knowl- 
edge, and since much of human knowledge is imprecise in nature, it is usually 
the case that the knowledge base of an expert system is a collection of rules 
and facts that, for the most part, are neither totally certain nor totally con- 
sistent [Zadeh 1983a, p. 200]. The storage of this vague and uncertain portion 
of the knowledge by using fuzzy sets seems much more appropriate than the 
use of crisp concepts and symbolism. 

3. As a consequence of what has been said in point 2, the “management of 
uncertainty” plays a particularly important role. Uncertainty of information 
in the knowledge base induces uncertainty in the conclusions, and therefore 
the inference engine has to be equipped with computational capabilities to 
analyze the transmission of uncertainty from the premises to the conclusions 
and to associate the conclusion with some measure of uncertainty that is 
understandable and properly interpretable by the user. The reader should also 
recall from chapter 1 that imprecision in human thinking and communica- 
tion is often a consequence of abundance of information, that is, the fact that 
humans can often process the required amount of information efficiently only 
by using aggregated (generic) information. This efficiency of human think- 
ing, when modeled in expert systems, might also increase efficiency, that is, 
decrease answering time and so on. 


Most of the expert systems existing so far contain an inference engine on the 
basis of dual logic. The uncertainty is taken care of by Bayesian probability 
theory. The conclusions are normally associated with a certainty or uncertainty 
factor expressing stochastic uncertainty, confidence, likelihood, evidence, or 
belief. Only recently have the designers of expert systems become aware of the 
fact that all of the types of uncertainty mentioned above cannot be treated the 
same way and that a factor of, for example, .8 to express the uncertainty of a con- 
clusion does not mean very much to the user. The expert systems marked with 
an asterisk in table 10-1 are already using fuzzy set approaches in different ways. 
We shall illustrate some of them later. In addition, proposals have been published 
on how fuzzy set theory could be used meaningfully in expert systems. 
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The most relevant approaches in fuzzy set theory are fuzzy logic and approx- 
imate reasoning for the inference engine [Lesmo et al. 1982; Sanchez 1979]; the 
presentation of conditions, indicators, or symptoms by fuzzy sets, especially lin- 
guistic variables, to arrive at judgements about secondary phenomena [Esogbue 
and Elder 1979; Moon et al. 1977; etc.]; the use of fuzzy clustering for diagno- 
sis [Fordon and Bezdek 1979; Esogbue and Elder 1983]; and combinations of 
fuzzy set theory with other approaches, for example, Dempster’s theory of evi- 
dence [Ishizuka et al. 1982], to obtain justifiable and interpretable measures of 
uncertainty. 

In chapter 9 we have already discussed fuzzy logic and its relationship to 
classical dual logic. Here we shall additionally focus on the if-then relationship, 
which is generally assumed to be deterministic. If this is not the case, we have 
to “qualify” its character. 

We shall distinguish three kinds of qualifications: truth qualification, proba- 
bility qualification, and possibility qualification. Qualifications of statements 
are possible or even necessary, independent of whether the statement or phe- 
nomenon is crisp or fuzzy. The kind of modeling, however, will have to be 
different. 

There is a difference between the truth of a part of a statement, a fact, or an 
antecedent and the truth of a compound statement. While the former depends on 
the antecedent’s conformity or compatibility with reality, the latter depends, in 
addition, on the type of connectives used to build the compound statement from 
its parts. We will discuss the former under “matching”; the latter will be consid- 
ered when discussing uncertainty in the process of inference. The reader is 
referred to the first part of this chapter with respect to truth qualification in fuzzy 
logic and approximate reasoning, and also to the section about possibility quali- 
fication further on. 


Probability Qualification. Itis not surprising that probability qualifications are 
still the most common way to characterize uncertainty with respect to the occur- 
rence of an event (which might be the real occurrence of the predicted—“true’”— 
outcome of a conclusion). Probability theory has long been the only way to model 
uncertainty and therefore, is still the most accepted method. Of course, proba- 
bility has often been abused to model all kinds of uncertainty! 

In the following we shall briefly discuss probability qualifications as point esti- 
mates, intervals, and (possibility) distributions. These approaches assume crisply 
defined events. For models of the probability of fuzzy events, the reader is 
referred to chapter 8 of this book [Dubois and Prade 1980a, pp. 141-144; Yager 
1984, pp. 273-283]. 

Let us consider the rule 
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If A then C 
A is true (antecedent) 
Then C is true (conclusion) 


In the most frequently applied Bayesian approach, the Bayes inversion theorem 
is used: 


Pr(C) 


Pr(A) Pr(A/C) (10.1) 


Pr(C/A) = 





Hence, Pr(C/A) is the probability of C given A, Pr(C) the probability of C, etc. 
If the antecedent has the possible states A; and the conclusion has the possible 
states C;, then (10.1) becomes 


Pr(C;) 
Pr(A;) 





Pr(C; /A;) = Pr(A;/C;) (10.2) 


(Determination of probabilities of conclusions in larger inference systems shall 
not be discussed here, because textbooks on probability theory exist in 
abundance.) 

Objections against this approach are, first of all, that aspects of uncertainty 
that are nonprobabilistic in nature may be included. Computationally this 
approach becomes prohibitive if the events (antecedent, conclusion) are consid- 
ered to be fuzzy—represented as fuzzy sets. A second criticism is the need to 
identify point values for the probabilities of events that may by far be overstate- 
ments of our actual knowledge of the likelihood of occurrence of that particular 
event. 

The criticism has lead Dempster [1967] to suggest the concept of upper and 
lower probabilities and Shafer [1976] to present his theory of evidence. The basic 
concept of this theory is that instead of representing the probability of an event 
A by a point value, Pr(A), it may be bounded by the subinterval [Pr(A), Pr(A)] of 
[0,1]. This theory has some connections to the theory of fuzzy sets and shall, 
therefore, be discussed in some more detail. Rather than following a purely prob- 
abilistic line of argument, see e.g. [Dubois and Prade 1982, p. 171; Goodman and 
Nguyen 1985] we shall follow Zadeh’s line of argument [Zadeh 1984], which 
seems easier to comprehend and closer to “fuzzy thinking”. After an introduction 
to the basic ideas of Dempster and Shafer, we will return to the more common 
representation of their theory. 

Let us consider the following introductionary example: 
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Table 10-2. A crisp data base. 


Emp 1 Name No. of children 


la B U N = 
me NU U = 


Table 10-3. An extended data base. 


Between 3 and 5 


Emp 2 Name No. of children children? 
1 1,2 impossible 
2 1 impossible 
3 4,5 certain 
4 5,6 possible 
5 6 impossible 
Example 10-1 


Let us assume we have a data base in which the (atomic) elements are related to 
each other by first-order relations. One of these may be as shown in table 10-2. 
In a simple range query of the type “what portion of the employees in the data 
base have between 1 and 3 children?” we would get, from table 10-2, the answer 
3/5, which may be interpreted as the probability of an employee (contained in 
the data base) having between 1 and 3 children. 

Let us now assume that our knowledge is less precise and that we only know 
the second-order relation shown in table 10-3. We now put the query: “What 
portion of the employees has between 3 and 5 children?’”. This is obviously pos- 
sible for employees 3 and 4. It is not possible for employees 1, 2, and 5! There- 
fore, the statement “he has between 4 and 5 children” is certainly true for 
employee 3; it is possibly true for employee 4; and it is certainly not true for 
employees 1, 2, and 5. 

In the Dempster-Shafer theory the portion of the intervals for which the state- 
ment is certainly true is called lower probability. In our example this is 1/5. As 
the upper probability they consider the portion of the elements (intervals) for 
which the statement can (possibly) be true (i.e. 1 minus the portion for which the 
statement cannot be true). In example 10-1 this is (1 — 3/5 = 2/5). 
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The lower probability is also called measure of belief and the upper probabil- 
ity is called measure of plausibility. It should be noted that in our example the 
employees were considered as atomic elements (all equal probabilities!). If this 
is not the case, the different probabilities of the intervals will have to be 
taken into consideration when determining lower and upper probabilities. Shafer 
calls the sets of attributes (number of children) assigned to the elements focal 
elements and their probabilities of occurrence basic probability assignment. 
In example 10-1 the answer to the question “what is the probability of an 
employee having between 3 and 5 children?” would be: the lower probability 
(degree of belief) is 1/5 and the upper probability (plausibility) (degree of belief) 
is 2/5. 

Example 10-1 was a rather intuitive example. Let us now define the uncer- 
tainty measures of the theory of evidence properly. 


Definition 10-1 [Dubois and Prade 1982a, 1985b; Prade 1985; Goodman and 
Nguyen 1985, p. 32] 


Let X be a finite set equipped with a probability measure Pr defined on the set 
P(X) of subsets of X. Consider a point-to-set mapping I from X to some set S. 
That is, V, € X,F (x) is a subset of S. Let fc S (f= focal element) and the mapping 
m from P(S) to [0,1] (basic probability assignment) be defined as follows: 


m(@) =0 


m(f) = Pr({x e X, T(x) = f} 


~ 1-Pr({x e XxX, T(x) =Ø} VESS, f#O 


Then the upper probability or plausibility measure is defined as 


Pr*(Q)=PL(Q)= > m(f) (10.3) 


fN 


The lower probability, belief function, or credibility measure (Dubois and Prade) 
is defined as 


Pr* (Q) = Bel(Q) = Cr(Q) È m(f) (10.4) 


fcQ 


In analogy to these measures of uncertainty, doubt or commonality measures and 
disbelief or incredibility measures have been defined [Goodman and Nguyen 
1985, p. 321]. 

Remark: Plausibility and belief are, of course, not unrelated. The following 
properties hold: 
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PL(Q) = Bel(Q) =1 (10.5) 

PL(Q) = Bel(Q) =0 (10.6) 

PL(Q) =1- Bel(-Q) (10.7) 

PL(AN B) < PL(A)+ PL(B) — PL(A U B) (10.8) 
Bel(A U B) < Bel(A)+ Bel(B)— Bel(AN B) (10.9) 


(10-5) relates to the normalization condition 


X m(f)=1 (10.10) 


feF 


which may lead to some problems [Zadeh 1984, pp. 6—10]. 

While Bel (Q) obviously considers evidence supporting Q, PL(Q) focuses on 
the evidence supporting the contrary. If F contains only singletons, then PL(Q) 
= Bel(Q); that is, these measures reduce to normal probabilities. So far we have 
looked at scalar measures (probabilities) and interval measures (belief, plausibil- 
ity). If we consider probability as a linguistic variable, then a measure for the 
probability of an event is a term of the linguistic variable “probability”—a fuzzy 
set characterized by its membership function. The notions of plausibility and 
belief have also been extended from crisp event (as considered here) to fuzzy 
event. The reader is referred to [Dubois and Prade 1985a, p. 553; Smets 1981]. 


Possibility Qualification. We now return to example 10-1 and assume that in 
table 10-3 the number of children of the various employees are described by pos- 
sibility distributions, see e.g. [Zadeh 1983b]. 

To review, a possibility distribution can formally be described by a fuzzy set. 
One difference between a possibility distribution and a fuzzy set, however, is that 
in a fuzzy set the elements of the support belong to the fuzzy set to various 
degrees while in a possibility distribution the possibilities indicate the degree of 
possibility with which a variable can adopt various values. A discrete possibility 
distribution shall be denoted by 


H = {(x;, I;)} 
Then 10-3 and 10—4 respectively, satisfy the following axioms [Shafer 1976]: 
PL(A U B) = max{PL(A), PL(B)} (10.11) 
Bel(AN B) = min{Bel(A), Bel(B)} (10.12) 


A plausibility measure which satisfies (10-11) is called a possibility measure (11), 
and a belief measure which satisfies (10—12) is called a necessity measure (N) 
[Prade 1985; Zadeh 1984]. (The latter is called a “consonant belief function” by 
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Shafer.) In contrast to (10-5) through (10-10), possibility measures (II) and 
necessity measures (N) have the following properties: 


min{M(Q), N(=Q)} =0 (10.13) 
max{II(Q), II(4Q)} =1 (10.14) 
II(Q)<1=> NM(Q)=0 (10.15) 
N(Q)>0=> TQ) =1 (10.16) 


Example 10-2 


Let us now assume that the information available concerning the number of chil- 
dren of our employees is not as in table 10-3, but as in table 10-4. Let us now 
ask “how possible is it that an employee has 3 or 4 children?”’. 

If we consider the possibility of 3 or 4 children as 


II = max (IT,;) = max{.6} =.6 
Onfe® 


the necessity as 


N = max (1 -TII,) = min{.2,0,0,0,.2,0,0} =0 
ON f= 2 


then our answer would have to be: 

“The possibility of an employee having 3 to 4 children is .6, the necessity is 0.” 
It should be noted that other interpretations and definitions of “necessity” and 
“possibility” measures exist, see e.g. [Dubois and Prade 1985a; Prade 1985]. 


Quantification. In human communication and therefore, also in knowledge 
transfer, statements include quantifiers other than the two quantifiers available 
in dual logic or classical mathematics. Often these quantifiers are implicit rather 
than explicit. An assertion of the type “Frenchmen are very charming” often 


Table 10-4. A possibilistic data base. 


Emp 3 Name Poss. of having x children 
1 {(1,.8),(2,1)} 
2 {1,1} 
3 {(4,.6),(5,1)} 
4 {(5,.8),(6,1)} 
5 {(6,1)} 
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really means “most (or almost all) Frenchmen are charming”. Likewise the pro- 
position “Hans is never late” would normally be interpreted as “Hans is late very 
rarely”. 

To model this and other types of quantifiers, fuzzy set theory includes fuzzy 
quantifiers. We shall view a fuzzy quantifier as “a fuzzy number which provides 
a fuzzy characterization of the absolute or relative cardinality of one or more 
fuzzy or nonfuzzy sets” [Zadeh 1982, p. 5]. Zadeh distinguishes between fuzzy 
quantifiers of the first kind (referring to absolute counts), and quantifiers of the 
second kind (referring to relative counts). Examples of the former are: several, 
few, many, etc. Examples of the latter kind are most, many, often, a large frac- 
tion, etc. Quantifiers of the third kind are ratios of quantifiers of the second kind 
(see also in chapter 9). 

Scalar quantifiers are normally modeled using their cardinality or sigma count 
Let us consider the proposition “Vickie has several close friends” [Zadeh 1982, 
p. 11]. The fuzzy set “close friends of Vickie” may be represented by 


F = { (Enrique, 1), (Ramon, .8), (Elie, .7), (Sergei, .9), (Ron, .7)} 
Then the sigma count (cardinality) of 
F=(1+.8+.7+.8+.7)=4 


If “several” plays the role of a specified subset of integers 1,..., 10, in which 
4 is assumed to be compatible with the meaning of “several” to the degree .8, the 
above proposition may be modeled as 


Poss{Count(close friends(Vickie)) = 4} = .8 


In some cases it might not be appropriate or desirable to express the cardinality 
of a fuzzy set as a number, rather as a fuzzy set. Zadeh proposed three notions 
of fuzzy counts based on the concept of o-level cuts: 


Definition 10-2 [Zadeh 1982, p. 15] 


Let F be a (discrete) fuzzy set and F, an O-level cut of fuzzy set F. Card, 
represents the cardinality (count) of the elements of an a-level cut. 
The FG-count is then defined to be the fuzzy set 


FG =(Card,,, supa{a| Card, 2i}) i=0,...,n 
The FL-count is defined as 
FL = {(Card,,, sup, {a@}Card, =n—-i}) i=1,...,n} 


The FE-count is the fuzzy set 
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FE = {(Card,,, min{[r¢(O;), Ure (O;)}) i=1,...,n} 


The counts of definition 10-2 may be interpreted as follows: The FG-count is the 
truth value of the proposition “F contains at least i elements”, FL the truth of 
“F contains at most i elements” and the FE-count of “F contains exactly 1 
elements”. 


Example 10-3 [Zadeh 1982, pp. 15-16] 
Let 
F = {(X, 9 .6), (x2 9 9), (x3, 1), (x4, .7), (x5 9 .3)} 


The a-level sets are listed in table 10—5. The various counts are 


FG(F) = {(0, 1), (1, 1), (2, .9), (3, .7), (4, -6), (5, .3)} 

FL(F) = {[(2, .1), G, .3), (4, .4), (5, .7), (6, D] - 1} 
= {(2, .1), (2, .3), (3, .4), (4, .7), (5, D} 

FE(F) = {(1, .1), (2, .3), (3, 4), (4, .6), (5, .3)} 


The normal sigma count would be 


IF| = X count(F) = $ w:(@) = 3.5 


Matching. By matching problem we mean the approximation of real evidence 
by assumed structures or of computational results by communication languages. 
In expert systems this problem occurs twice; whenever knowledge (relations 
between facts) contained in the knowledge base has to be used on the basis of 
observed facts that do not quite coincide with the “models of facts” in the knowl- 
edge base, or when it cannot be decided whether it coincides or not. 


Table 10-5. a-level sets. 


Q F; 
1. {x3} 
9 {X2, X3} 
-7 {X2, X3, X4} 
6 {Xi, X2, X3, X4} 
3 {X1, X2, X3, X4, X5} 
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The first case is represented by example 9-7 in which the knowledge base con- 
tains only the “fact” red tomatoes while the observed fact is “very red tomato”. 
For the second case, consider the rule “if the rod is hot, stop the heating process”. 
The observed fact could be “the rod has a temperature of 150°C”. The question 
then is “is that rod hot or not hot?”. 

Let us call these two types of problems input matching and discuss methods 
for their solution further down. Another matching problem occurs when the result 
of the inference process has been obtained—e.g., as the membership function of 
a fuzzy set. The user of the system, however, does not want the answer as a func- 
tion but in a language close to his own. The problem is then to search for a term 
of a linguistic variable whose membership function is very close to the one 
obtained by the system. This is, of course, a problem of output interpretation and 
we shall call it output matching. 

The input-matching problem is obviously already reduced if the knowledge 
base contains descriptions in the form of fuzzy sets rather than only crisp models. 
Also, it has been suggested that in addition to using similarity relations, truth and 
certainty values be used to model the degree of compatibility of reality and model 
and to introduce it into the inference process. Another promising approach is the 
suggestion by Cayrol, Farrency, and Prace [1982] to use pattern matching where 
possibility measures and necessity measures are employed, in order to evaluate 
the semantic similarities between patterns (models) and data. 

Output matching is more a psycholinguistic problem. It occurs primarily if 
approximate or plausible reasoning methods or other fuzzy approaches are used 
in which membership functions (of linguistic variables, for instance) are used. 
Even if at the input level the semantic meaning of data and formal knowledge 
representation coincides satisfactorily, the process of inference may yield mem- 
bership functions that do not fit the membership functions of linguistic variables 
or their terms, as defined beforehand, well enough to communicate the results 
effectively to the user of an expert system. 

Certainty factors or degrees of truth do not relay a missing correspondence 
well enough. Another approach, which seems to be promising but not yet well 
enough developed to be used efficiently, is the linguistic approximation men- 
tioned in example 9-6. 

We shall describe some more recent attempts to apply fuzzy set theory to 
knowledge representation and inference mechanisms in expert systems. 

Although, in a precise environment, production rules are adequate to represent 
procedural knowledge (as was seen in section 10.1), this is no longer true in a fuzzy 
environment. One way to deal with imprecision is to use fuzzy production rules, 
where the conditional part and/or the conclusions part contains linguistic variables 
(see chapter 9). An application of this knowledge-representation technique in the 
area of job-shop scheduling has been given by Dubois [1989, p. 83]. Negoita 
[1985, p. 80] gives a basic introduction into fuzzy production rules. 
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While little work has been done in the field of “fuzzy semantic nets,” sug- 
gestions to fuzzify frames to represent uncertain declarative knowledge, and an 
illustrative example, stem from Graham and Jones [1988, p. 67]. The two main 
generalizations for arriving at a fuzzy frame are 


1. allowing slots to contain fuzzy sets as values, in addition to text, list, and 
numeric values, 
2. allowing partial inheritance through is-a slots. 


As a consequence of the representation of imprecise and uncertain knowledge, 
it is necessary to develop adequate reasoning methods. Since 1973, when Zadeh 
suggested the compositional rule of inference, a lot of work has been done in the 
field of fuzzy inference mechanisms [Dubois and Prade 1988a, p. 67; Zimmer- 
mann 1988, p. 736]. 

Nevertheless, there does not yet exist—and probably never will—a generally 
usable expert system shell that can be applied to all possible contexts. One of the 
reasons is that human reasoning depends on the context, i.e., the person with a 
specific educational background and the situation in which a problem has to be 
solved. The selection of existing models for the “implication” in chapter 9 is one 
indication of this. There are essentially two ways to circumvent this difficulty: 
Either a fuzzy expert system shell has to be designed for a small subset of con- 
texts (1.e., medical diagnosis problems, technical diagnosis, or management plan- 
ning problems) or such a shell will be a toolbox including various ways of 
reasoning, uncertainty representations, linguistic approximation, etc., from which 
the appropriate approaches have to be selected in a certain context. Since the 
second version does not yet exist, we shall turn towards considering exemplarily 
some more dedicated expert systems. 


10.3 Applications 


We shall now illustrate the use of fuzzy set theory in expert systems by sketch- 
ing some example “cases” (existing expert systems and published approaches that 
could be used in systems). 


Case 10-1: Linguistic Description of Human Judgments [Freksa 1982] 


Freksa presents empirical results that suggest that more natural, especially lin- 
guistic representations of cognitive observations yield more informative and reli- 
able interpretations than do traditional arithmomorphic representations. He starts 
from the following assumed chain of cognitive transformations. 
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object 
L 


percept 


mental representation 


J 


verbal description 


formal description 
L 


interpretation. 


The suggested representation system for “soft observations” is supposed to have 
the following properties [Freksa 1982, p. 302]: 


1. The resolution of the representation should be flexible to account for varying 
precision of individual observations. 

2. The boundaries of the representing objects should not necessarily be sharp 
and should be allowed to overlap with other representing objects. 

3. Comparison between different levels of resolution of representation should 
be possible. 

4. Comparison between subjective observations of different observers should 
be possible. 

5. The representation should have a small “cognitive distance” to the 
observation. 

6. It should be possible to construct representing objects empirically rather than 
from theoretical considerations. 


The observations are expressed by simple fuzzy sets that can be described by the 
quadruples {A, B, C, D}, illustrated in figure 10-3, with the following interpre- 
tation: It is entirely possible that the actual feature value observed is in the range 
[B, C]; it may be possible that the actual value is in the ranges [A, B] or [C, D], 
but more easily closer to [B, C] than further away; an actual value outside of [A, 
D] is incompatible with the observation. [B, C] is called “core,” and [A, B] and 
[C, D] are called “penumbra” of the possibility distribution. 

The construction of a repertoire of semantic representations for linguistic 
descriptors is done in the following way (see figure 10-4): 


1. The observer selects a set of linguistic labels that allows for referencing all 
possible values of the feature dimension to be described. 
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membership 






ju» mam ae ES SS ED ED SD OD ED OS ODO oO oe OE ew a a 


feature 
A B C D dimension 


Figure 10-3. Linguistic descriptors. 





membership 








GD o ab as ue ED ED qum ED Ce an ao ae OD a ED a ED OP oe SD D OD an ee 
Same na qmm a oana Om ewe nn ama a eee eeeee. 


labelset 


L1 L2 L3 L4 L5 L6 L7 L8 L9 
no no - - yes yes - no no 


Figure 10-4. Label sets for semantic representation. 
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2. The repertoire of linguistic labels is arranged linearily or hierarchically in 
accordance with their relative meaning in the given feature dimensions. 

3. A set of examples containing a representative variety of feature values in the 
given feature dimension is presented to the observer. The observer marks all 
linguistic labels that definitively apply to the example feature value with 
“yes” and the labels that definitely do not apply with “no.” The labels that 
have not been marked may be applicable, but to a lesser extent than the ones 
marked “yes.” 

4. From the data thus obtained, simple membership functions are constructed 
by arranging the example objects according to their feature values (using the 
same criterion by which the linguistic labels had been arranged). These 
values form the domain for the assignment for membership values. 

5. Finally, we assign to a given label the membership value “yes” to the range 
of examples in which the given label was marked “yes” for all examples and 
the membership value “no” to the ranges in which the given label was marked 
“no” for all examples. The break-off points between the regions with mem- 
bership value “no” and “yes” are connected by some continuous, strongly 
monotonic function to indicate that the membership of label assignment 
increases the closer one gets to the region with membership assignment “yes” 
[Freska 1982, p. 303]. 


It is not difficult to imagine how the above technique could be used in expert 
systems for knowledge acquisition and for the user interface. 


Case 10-2: CADIAG-2, An Expert System for Medical Diagnosis 


Expert knowledge in medicine is to a large extent vagus. The use of objective 
measurements for diagnostic purposes is only possible to a certain degree. The 
assignment of laboratory test results to the ranges “normal” or “pathological” is 
arbitrary in borderline cases, and many observations are very subjective. The 
intensity of pain, for instance, can only be described verbally and depends very 
much on the subjective estimation of the patient. Even the relationship between 
symptoms and diseases is generally far from crisp and unique. Adlassnig and 
Kolarz [1982, p. 220] mention a few typical statements from medical books that 
should illustrate to readers who are not medical doctors the character of avail- 
able information: 


Acute pancreatitis is almost always connected with sickness and vomiting. 
Typically, acute pancreatitis begins with sudden aches in the abdomen. 


The case history frequently reports about ulcus ventriculi and duodendi. 
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Bilirubinurie excludes the hemolytic icterus but bilirubin is detectable with hepa- 
tocellular or cholestatic icterus. 


They designed and implemented CADIAG-2, for which they stated the follow- 
ing objectives [Adlassnig 1980, p. 143; Adlassnig et al. 1985]: 


1. Medical knowledge should be stored as logical relationships between symp- 
toms and diagnoses. 

2. The logical relationship might be fuzzy. They are not obliged to correspond 
to Boolean logic. 

3. Frequent as well as rare diseases are offered after analyzing the patient’s 
symptom pattern. 

4. The diagnostic process can be performed iteratively. 

5. Both proposals for further investigations of the patient and reasons for all 
diagnostic results are put out on request. 


To sketch their system, let us use the following symbols: 
S={5S,,..., Sn}: = set of symptoms 
D={D,,..., D,}:=set of diseases or diagnoses 
P={P..., Py: = set of patients 
All $, D,, and P, are fuzzy sets characterized by their respective membership 
functions. 
lis, expressed the intensity of symptom 7 
us, expresses the degree of membership of a patient to D, 


lip, assigns to each diagnosis a degree of membership for P.. 
Two aspects of symptom S with respect to disease D, are of particular interest: 


1. Occurrence of S, in case of D,, and 
2. Confirmability of S; for D; 
This leads to the definition of two fuzzy sets: 
O(x), x={0,1,...,100} for occurrence of $; at D, 
and 


C(x), x={0,1,...,100} representing the frequency with which 


S; has been confirmed for D, 
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The membership functions for these two fuzzy sets are defined to be 
Ho(x) = f(x; 1,50,99) xex 
ue(x) = f(x51,50,99) xeY 


where X is the occurrence space, Y is the confirmability space, and f is defined 
as follows (see also figure 9-4!): 


0 x<a 


2 
(==) a<x<b 





c—a 


f(x;a,b,c)= 





2 
1-(2=) frb<x<c 
1 forx>c 


The SD, occurrence and confirmability relationships are acquired empirically 
from medical experts using the following linguistic variables: 


i Occurrence O; Confirmability C, 

1 always always 

2 almost always almost always 

3 very often very often 

4 often often 

5 unspecific unspecific 

6 seldom seldom 

7 very seldom very seldom 

8 almost never almost never 

9 never never 
unknown unknown 


The membership functions of O, and C, are shown in figure 10-5. They are 
arrived at by applying modifiers (see definition 9-3) to “never” and “always.” 
For details of the data acquisition process, see Adlassnig and Kolarz [1982, 
p. 226]. 

Other relationships such as symptom-symptom, disease—disease, and 
symptom—disease are also defined as fuzzy sets (fuzzy relations). Possibilistic 
interpretations of relations (min-max) are used. Given a patient’s symptom 
pattern, the symptom | disease relationships, the symptom | combination-disease 
relationships, and the disease | disease relationships yield fuzzy diagnostic 
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unspecific 

Hy, (x) never very seldom very often always 
He, (x) almost seldom often almost 
1.00 never always 

\ y / / 
0.90 
0.80 
0.70 
0.60 
0.50 
0.40 
0.30 
0.20 
0.10 

= rel. 
10 20 30 40 #50 60 70 80 .90 1.00 frequ. 


Figure 10-5. Linguistic variables for occurrence and confirmability. 


indications that are the basis for establishing confirmed and excluded diagnosis 
as well as diagnostic hypotheses. 

Three binary fuzzy relations are then introduced: The occurrence relation, Ro, 
the confirmability relation, Re, both in X Œ Y, and the symptom relation, Rs, 
which is determined on the basis of the symptom patterns of the patients. 

Finally, four different fuzzy indications are calculated by means of fuzzy 
relation compositions [Adlassnig and Kolarz 1982, p. 237]: 


1. S;D, occurrence indication R, = Rs o Ro 
WR, (p, D;) = max min{p z; (p, S,), Uro (S, D,)} 
2. S;D, confirmability indication R, = R; © Rē 


Ue, (p, D;) = max Min {[ z; (p, S;), H Re (S;, D;)} 


210 FUZZY SET THEORY—AND ITS APPLICATIONS 
3. S, D, nonoccurrence indication R, = R; o (1 -— Rs) 

Ha (p, D;) = max min{uz (p, S;), 1- Hro (Si, D;)} 
4. S; D; nonsymptom indication R, =(1- R) o Rs 

Up, (p, D;) = max min{1 -ug (p, S;), uz (S;, D;)} 


Similar indications are determined for symptom |disease relationships, and we 
arrive at 12 fuzzy relationships R;. 
Three categories of diagnostic relationships are distinguished: 


1. Confirmed diagnoses 
2. Excluded diagnoses 
3. Diagnostic hypotheses 


Diagnoses are considered confirmed if 
Mg=1 for s=1or6 


or if the max-min composition of them yields 1. 

For excluded diagnosis, the decision rules are more involved; and for diag- 
nostic hypotheses, all diagnoses are used for which the maximum of the follow- 
ing pairs of degrees of membership are smaller than .5: 


max{ uz, Ma} <5 for 
{j,k}={1,2} or {5,6} or {9,10} 


CADIAG-2 can be used for different purposes: for example, diagnosing diseases, 
obtaining hints for further examinations of patients, and explanation of patient 
symptoms by diagnostic results. 


Case 10-3: SPERIL I, an Expert System to Assess Structural Damage 
[Ishizuka et al. 1982] 


Earthquake engineering has become an important discipline in areas in which the 
risk of earthquake is quite high. 


Frequently, the safety and reliability of a particular or a number of existing structures 
need to be evaluated either as part of a periodic inspection program or immediately 
following a given hazardous event. Because only a few experienced engineers can prac- 
tice it well to date, it is planned to establish a systematic way for the damage assess- 
ment of existing structures. SPERIL is a computerized damage assessment system as 
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designed by the authors particularly for building structures subjected to earthquake 
excitation [Ishizuka et al. 1982, p. 262] 


Useful information for the damage assessment comes mainly from the fol- 
lowing two sources: 


1. visual inspection at various portions of the structure 
2. analysis of accelerometer records during the earthquake 


The interpretation of these data is influenced to a large extent by the particular 
kind of structure under study. Information for damage assessment is usually col- 
lected in a framework depicted in figure 10-6. 

It is practically impossible to express the inferential knowledge of damage 
assessment precisely. Therefore the production rules in SPERIL I are fuzzy. A 
two-stage procedure is used to arrive at fuzzy sets representing the degree of 
damage. First the damage is assessed on a 10-point scale, and then the rating is 
transformed into a set of terms of the linguistic variable “damage.” 

Let d be the damage evaluated at a 10-point scale. Then the relationship 
between the terms and the original ratings can be described as follows: 


slight moderate severe destructive 


T,.(d) = {(0, 1), (1, .5)} 
Tign (d) = {(1, .5), (2, -1), (3, .5)} 
Trroderate (d) = {(3, .5), (4, -1), (5, .7), (6, -3)} 
Tevere (d) = {(5, .3), (6, .7), (7, 1), (8, -7), (9, .3)} 
Taestructive (d) = {(8, .3), (9, .7), (10, D} 


The rule associated with node 2 in figure 10-8, for instance, would then read 


IF: MAT is reinforced concrete, 
THEN IF: STI is no, 
THEN: GLO is no with 0.6, 
ELSE IF: STI is slight, 
THEN: GLO is slight with 0.6, 
ELSE IF: STI is moderate, 
THEN: GLO is moderate with 0.6, 
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[Ishizuka et al. 1982, p. 263]. 


ELSE IF: STI is severe, 
THEN: GLO is severe with 0.6, 
ELSE IF: STI is destructive, 
THEN: GLO is destructive with 0.6, 
ELSE: GLO unknown with 1, 
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where 


MAT = structural material, 

GLO = damage of global nature, 

STI = diagnosis of stiffness, and 

“unknown” stands for the universe set of damage grade. 


To obtain a correct answer by using such knowledge, a rational inference 
mechanism is required to process the rules expressed with fuzzy subsets along 
with uncertainty in an effective manner. 

To include uncertainty, first Dempster’s and Shafer’s probabilities were used 
[Dempster 1967; Shafer 1976]. Thus the conclusions were accompanied by a 
lower and upper probability indicating lower and upper bounds of subjective 
probabilities. (For details, see Ishizuka et al. [1982, pp. 264—266].) 

It was felt that the rules as shown for node 2 could not necessarily be expressed 
as crisp rules. Therefore fuzzy inference rules were introduced in order to arrive 
at a fuzzy damage assessment together with upper and lower probabilities. For 
details, the reader is again refereed to the above-mentioned source. 

Improvements, particularly of the knowledge acquisition phase, have been 
suggested [Fu et al. 1982; Watada et al. 1984]. They either use fuzzy clustering 
or a kind of linguistic approximation. 


Case 10-4: ESP, an Expert System for Strategic Planning 
[Zimmermann 1989] 


Strategic planning is a large heterogeneous area with changing content over time 
and without a closed theory such as is available in other areas of management 
and economics. It deals with the long-range planning of a special company and 
is frequently done for independent autonomous units, called strategic business 
units (SBUs) [Hax and Majluf 1984, p. 15]. One technique for analyzing the 
current and future business position is the business portfolio approach. 

The original idea of portfolio analysis in strategic planning was to describe 
the character of a corporation by the positions of SBUs in a two-dimensional port- 
folio matrix and to try to find strategies aimed at keeping this “portfolio” bal- 
anced. Some of the major problems encountered are given below. 


Dimensionality. It is obvious that two dimensions are insufficient to describe 
adequately the strategic position of an SBU. Two dimensions are certainly prefer- 
able for didactical reasons and for presentation, but for realistic description a mul- 
tidimensional vectorial positioning would be better. 
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Data Collection and Aggregation. Even for a two-dimensional matrix, the 
dimensions of an SBU must be determined by a rather complex data-gathering 
and aggregation process. Factors such as ROI, market share, and market growth 
can be obtained without too much difficult. Other factors to be considered are 
combinations of many aspects. It is, therefore, not surprising that intuitive aggre- 
gation and the use of scoring methods are rather common in this context, although 
their weaknesses are quite obvious: Aggregation procedures are kept simple for 
computational efficiency, but they are very often not justifiable. Different factors 
are considered to be independent without adequate verification. A lot of subjec- 
tive evaluations enter the analysis with very litte control. 


Strategy Assignment. In classical portfolio matrixes, broad strategic categories 
have been defined to which basic strategies are assigned. It is obvious that these 
categories are much too rough to really define operational strategies for them. 
One of the most important factors in determining real strategies will be the knowl- 
edge and experience of the strategic planners who transform those very general 
strategic recommendations into operational strategies—a knowledge that is not 
captured in the portfolio matrixes! 


Modeling and Consideration of Uncertainty. In an area into which many ill- 
structured factors, weak signals, and subjective evaluations enter, and which 
extends so far into the future, uncertainty is obviously particularly relevant. 
Unfortunately, however, uncertainty is hardly considered in most of the strategic 
planning systems we know. The utmost that is done is to sometimes attach uncer- 
tainty factors to an estimate and then to aggregate those together with the data in 
a rather heuristic and arbitrary way. 

ESP, an Expert System for Strategic Planning, tries to improve classical 
approaches and to remedy some of their shortcomings. It also provides a 
framework in which strategic planners can analyze strategic information and 
develop more sophisticated strategic recommendations. It characteristics are as 
follows: 


Dimensionality. Multidimensional portfolio matrixes are used. For visual- 
ization, two dimensions each can be chosen; the location of SBUs are defined 
by vectors, however. As an example, let us consider the four following 
dimensions: 


Technology Attactiveness 
Technology Position 
Market Attractiveness 
Competitive Position 


PUNE 
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Figure 10-7. Combination of two two-dimensional portfolios. 


If we combine the first two and the last two dimensions we obtain two two- 
dimensional portfolio matrixes which, combined, correspond to a four-dimen- 
sional matrix (see figure 10-7). If each of the two-dimensional matrixes consists 
of nine strategic categories by having three intervals—low, medium, high—on 
each axis, then the combined matrix contains 9 x 9 = 81 strategic positions. 
Graphically, only the two-dimensional matrixes are shown. The positions of the 
combined matrix are only stored vectorially and used for more sophisticated 
policy assignment. 


Data Collection and Aggregation. Each “dimension” is defined by a tree of 
subcriteria and categories. Figure 10-8 shows a part of the tree for “Technology 
Attractiveness.” 
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Figure 10-9. Terms of “degree of achievement.” 


The input given by the user consists of one linguistic variable for all criteria 
of the leaves (lowest subcriteria in each of the four trees). This linguistic vari- 
able denotes the respective “degree of achievement”; it can be chosen from the 
terms “not at all,” “little,” “medium,” “considerably,” and “full.” These terms are 
represented by trapezoidal membership functions that are characterized by their 
four characteristic values on their supports (see figure 10-9). 

To arrive at the root of each tree, these ratings of the leaves are aggregated 
on every level of the tree by using the y-operator, described in chapter 3. There 
the reader will find other operators (e.g. minimum, product), which can also be 
chosen by the user. It is suggested that this aggregation of linguistic terms, rather 
than of numerical values, be done by aggregating the four characteristic values 
of each trapezoid in order to obtain the respective characteristic value of the 
resulting trapezoid. The last aggregation level of one tree is shown in figure 
10-10. Repeating this procedure for all characteristic values of the membership 
functions of all aggregation steps of each of the four trees leads to a trapezoidal 
membership function for each of the criteria. 


99 66 


Strategy Assignment. As already mentioned, strategy assignment is made on 
the basis of the vectorially described position of an SBU. Two levels can be 
distinguished: 


1. General Policy Recommendation 
This is assigned to the position of the SBU as it is defined by the values of 
the roots of the trees. In our example, the position would be defined by tech- 
nology attractiveness, technology position, competitive position, and market 
attractiveness. 
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2. Detailed Policy Recommendation 

Policy recommendations based on the location in the portfolio matrix, which 
in turn is determined by the values of the roots of the evaluation trees only, 
can only be very rough guidelines. The same value at a root of the tree can 
be obtained from very different vectors of values of the nodes of the first 
level of the tree. The values of this vector are, therefore, used to make more 
specific strategic recommendations in addition to the basic policy proposal 
mentioned above. In the example tree shown in figure 10-8, for instance, the 
ratings of “Acceptance,” “Technological Potential,’ “Breadth of Applica- 
tion,” and “Complementarity” would be used for such a specification of the 
strategic recommendation. 


Modeling and Consideration of Uncertainty. Itis possible for the user of ESP 
to interact with this system by defining a special o-level that results in a rectan- 
gle in the portfolio matrixes, as shown in figure 10-11. The a-level denotes the 
desired degree of certainty, and the corresponding area in the matrix is a visual- 
ization of the possible position of the considered SBU. 


ESP: Implementation. We had intended to design ESP by using one of the 
available shells. It turned out, however, that none of the available shells offers 
all the features we needed. Therefore, a combination of a shell (in this case 
Leonardo 3.15) with a program (in Turbo Pascal) had to be used. The basic struc- 
ture of ESP is shown in figure 10-12. 

Knowledge Base I contains primarily rules that assign basic strategy 
recommendations to locations of SBU in multidimensional portfolio matrixes 
and detailed supplementary recommendations to profiles of the first levels 
of trees. Together with the inference engine, it provides for the user the “if- 
then” part and the explanatory function. For this part, the shell Leonardo 3.15 
was used. 

Knowledge Base II contains the structures of the free defineable trees 
that determine the location of an SBU in the different dimensions of the 
multidimensional matrix. The “Aggregator” computes their values and charac- 
teristic values for the linguistic values for all nodes of the trees on the basis 
of available structural knowledge (tree structure, O-values, and y-values) and 
on the basis of data (u-values) entered for each terminal leaf by the user. The 
information provided by the “Aggregator” is then used for the visual presenta- 
tion of two-dimensional matrixes and profiles and also supports the explanatory 
module. 

All aggregation and visual presentation functions could not be accommodated 
by Leonardo 3.15. Therefore, an extra program in Turbo-Pascal and the appro- 
priate bridge programs to Leonardo had to be written. 
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Figure 10—11. Portfolio with linguistic input. 


ESP is fully menu driven. It could be considered as a second-generation 
expert system that works with shallow knowledge (KB I) as well as with deep 
knowledge (KB II). 
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Figure 10-12. Structure of ESP. 
Exercises 


1. What are the differences between a decision support system and an expert 
system? 

2. Construct examples of domain knowledge represented in the form of rules, 
frames, and networks. Discuss advantages and disadvantages of these three 
approaches. 

3. List, describe, and define at least four different types of uncertainty men- 
tioned in this book. Associate appropriate theoretical approaches with them. 
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4. An expert in strategic planning has evaluated linguistically the degree of 
achievement of the lowest subcriteria of the criterion “Technology Attractive- 
ness.” He denotes the corresponding trapezoidal membership functions by the 
vectors of the characteristic values. After the first aggregation step, the evalu- 
ation of the first-level criteria results. The respective trapezoidal membership 
functions are given by the following vectors of the characteristic values: 


Acceptance: (.2, .3, .5, .7) 
Technological Potential: (.6, .7, .9, 1) 
Breadth of Application: (.4, .5, .6, .7) 
Complementarity: (.1, .3, .4, .6) 


Compute the four characteristic values of the criterion “Technology Attrac- 
tiveness” by using the y operator with y = .5 and equal weights for all first- 
level criteria for the four respective characteristic values given above. Draw 
the resulting stripe in the portfolio matrix for & = .8. 


T 1 FUZZY CONTROL 


11.1 Origin and Objective 


The objective of fuzzy logic control (FLC) systems is to control complex 
processes by means of human experience. Thus fuzzy control systems and expert 
systems both stem from the same origins. However, their important differences 
should not be neglected. Whereas expert systems try to exploit uncertain knowl- 
edge acquired from an expert to support users in a certain domain, FLC systems 
as we consider them here are designed for the control of technical processes. The 
complexity of these processes range from cameras [Wakami and Terai 1993] and 
vacuum cleaners [Wakami and Terai 1993] to cement kilns [Larsen 1981], model 
cars [Sugeno and Nishida 1985], and trains [Yasunobu and Miamoto 1985]. 
Furthermore, fuzzy control methods have shifted from the original translation of 
human experience into control rules to a more engineering-oriented approach, 
where the goal is to tune the controller until the behavior is sufficient, regardless 
of any human-like behavior. 

Conventional (nonfuzzy) control systems are designed with the help of phys- 
ical models of the considered process. The design of appropriate models is time- 
consuming and requires a solid theoretical background of the engineer. Since 
modeling is a process of abstraction, the model is always a simplified version of 
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the process. Errors are dealt with by means of noise signals, supplementary model 
states, etc. Many processes can, however, be controlled by humans without any 
model, and there are processes that cannot be controlled with conventional control 
systems but are accessible to control by human operators—for example, most 
people with a driving licence can drive a car without any model. The formaliza- 
tion of the operator’s experience by the methods of fuzzy logic was the main idea 
behind fuzzy logic control: 


The basic idea behind this approach was to incorporate the “experience” of a human 
process operator in the design of the controller. From a set of linguistic rules which 
describe the operator’s control strategy a control algorithm is constructed where the 
words are defined as fuzzy sets. The main advantages of this approach seem to be the 
possibility of implementing “rule of the thumb” experience, intuition, heuristics, and 
the fact that it does not need a model of the process [Kickert and Mamdani 1978, 
p. 29]. 


Almost all designers of FLC systems agree that the theoretical origin of those 
systems is the paper “Outline of a New Approach to the Analysis of Complex 
Systems and Decision Processes” by Zadeh [1973b]. It plays almost the same 
role that the Bellman—Zadeh [1970] paper titled “Decision Making in a Fuzzy 
Environment” does for the area of decision analysis. In particular, the composi- 
tional rule of inference (see definition 9-7) is considered to be the spine of all 
FLC models. The original activities centered around Queen Mary College in 
London. Key to that development was the work of E. Mamdani and his students 
in the Department of Electrical and Electronic Engineering. Richard Tong, of 
nearby Cambridge, was another key figure in the development of fuzzy control 
theory. The first application of fuzzy set theory to the control of systems was by 
Mamdani and Assilian [1975], who reported on the control of a laboratory model 
steam engine. It is interesting to note that the first industrial application of fuzzy 
control was the control of a cement kiln in Denmark [Holmblad and Ostergaard 
1982]. The area of fuzzy control was neglected by most European and American 
control engineers and managers until the end of the 1980s, when Japanese 
manufacturers launched a wide range of products with fuzzy controlled parts 
and systems. 

Fuzzy control was (and still is) treated with mistrust by many control engi- 
neers. This attitude towards fuzzy control is changing, and most of the progress 
in this area is due to control engineers who started with conventional control 
theory (and still apply it). “Fuzzy logic” became a marketing argument in Japan 
at the end of the 1980s, and popular press articles gave the impression that fuzzy 
control systems are cheap, easy to design, very robust, and capable of outper- 
forming conventional control systems. This is certainly not generally true; the 
real situation depends heavily on the system to be controlled. The lack of prac- 
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Figure 11—1. Automatic feedback control. 


tical experience in FLC design and well-trained engineers in the field must also 
be considered when one decides to implement fuzzy controllers. FLC is, however, 
beginning to establish itself as a recognized control paradigm and will play a 
major role in control theory in the future. 


11.2 Automatic Control 


The process of automatic control of a technical process relies mainly on the com- 
parison of desired states of the process with some measured or evaluated states. 
The controller tries to reach the desired states (setpoints) by adjustment of the 
input values of the process that are identical to the translated output values of the 
controller. Due to the continuous comparison of these values, one gets a closed- 
loop system. Usually a noise signal leads to deviations from the set-points and 
thus to dynamically changing controller outputs. Figure 11—1 depicts an auto- 
matic feedback control system. 

Conventional control strategies use process models or experimental results as 
a basis for the design of the control strategies. The well-known PID controllers 
are widely used design paradigms. They use information about the input—output 
behavior of the process to generate the control action. The behavior of the closed 
loop is controlled by different gain values that can be adjusted independently by 
the control engineer. Modern computer-controlled (direct digital control, DDC) 
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systems have to deal with sampled values and are therefore modeled as time- 
discrete control systems with sampling units. Thus the control action is a func- 


tion of the error vector of recent errors e: = [e(k), e(k — 1),..., e(k —r)], where 
k is the sampling time, and the vector of the last control outputs u: = [u(k — 1), 
u(k —2),...,u(k—s)]. We derive the current control action as u(k): = fle, u). Note 


that e(k) and u(k) can be vectors in systems with many inputs and outputs (MIMO). 


11.3 The Fuzzy Controller 


Fuzzy controllers are special DDC systems that use rules to model process knowl- 
edge in an explicit way. Instead of designing algorithms that explicitly define the 
control action as a function of the controller input variables, the designer of a 
fuzzy controller writes rules that link the input variables with the control vari- 
ables by terms of linguistic variables. Consider, for example, the heating system 
in your living room. If the temperature is slightly too low, then you would prob- 
ably want to increase the heating power a bit. If you now want to control the 
room temperature by a fuzzy controller, you just interpret the terms “slightly too 
low” and “a bit” as terms of linguistic variables and write rules that link these 
variables, e.g., 


If temp = “slightly too low,” 
then change of power = “increased by a bit” 


After all rules have been defined, the control process starts with the computation 
of all rule-consequences. Then the consequences are aggregated into one fuzzy 
set describing the possible control actions, which in this case are different values 
of the change of power. These computations are done with the computational 
unit. Since our heating system doesn’t understand a control action like “increased 
by a bit,” the corresponding fuzzy set has to be defuzzified into one crisp control 
action using the defuzzification module. This simple example illustrates the main 
ingrediences of a fuzzy controller: the rule base that operates on linguistic vari- 
ables, the fuzzification module that generates terms as functions of the crisp input 
values (temperature, in this case), and the computational unit that generates the 
terms of the output variables as a function of the input terms and the rules of the 
rule base. Since the controlled process has to be fed with a crisp signal (instead 
of increased by a bit in the example), the result of the computational unit that is 
a term of a linguistic variable has to be transformed into a crisp value. Figure 
11-2 depicts a generic so-called “Mamdani” fuzzy controller. Modifications of 
this scheme are possible and will be explicitly discussed later. 

When designing fuzzy controllers, several decisions regarding the structure 
and the methodology have to be made. It is possible to view a fuzzy controller 
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as a 7-tuple with the entries (input/fuzzification/rules/rule evaluation/ 
ageregation/defuzzification/output) [compare Buckley 1992]. Possible decision 
parameters are as follows: 


Input: number of input signals, number of derived states of each input signal, 
scaling of the input signal 


Fuzzification: type of membership functions, mean, spread and peak of 
membership functions, symmetry, crosspoints, continuous or discrete support, 
granularity (number of membership functions) 


Rules: number of rules, number of antecedents, structure of rule base, type of 
membership functions in consequences, rule weights 


Rule evaluation: aggregation operator in the antecedent, inference operator 


Aggregation: aggregation operator combining the results of the individual rules, 
individual rule-based inference (functional approach), or composition-based 
inference (relational approach) 


Defuzzification: defuzzification procedure 


Output: number of output signals (usually determined by problem structure), 
scaling 


We will return to these parameters in more detail later. This classification, 
however, shows that a fuzzy controller is the result of a sequence of decisions by 
the designer. It is therefore not appropriate to talk about the fuzzy controller; one 
should rather explicate which type of controller is under consideration. Many 
modifications of Mamdani’s original controller [Mamdani and Assilian 1975] 
have been proposed since the publication of the original paper in 1975. One 
important and often used modification was introduced by Sugeno [1985b] and 
will be described after the discussion of Mamdani’s original controller. 


11.4 Types of Fuzzy Controllers 
11.4.1 The Mamdani Controller 


The main idea of the Mamdani controller is to describe process states by means 
of linguistic variables and to use these variables as inputs to control rules. We 
start with the assignment of terms to input variables. The base variable is an input 
variable that can be measured or derived from a measured signal or an output 
variable of the controller. In the heating system example, possible base variables 


FUZZY CONTROL 229 


Membership 


1» very low low comfortable high very high 





0 30 x=Temperature [°C] 


Figure 11-3. Linguistic variable “Temperature.” 


are room temperature, change of room temperature, number of open windows, 
outdoor temperature, change of power, etc. This example illustrates that the 
number of input signals is far from obvious. The terms of the linguistic variables 
are fuzzy sets with a certain shape. It is popular to use trapezoidal or triangular 
fuzzy sets due to computational efficiency, but other shapes are possible. The lin- 
guistic variable “temperature” could, for example, consist of the terms “very low” 
(vl), “low” (1), “comfortable” (c), “high” (h), and “very high” (vh), as shown in 
figure 11-3. 

Formally, we describe the terms of each linguistic variable LV,,..., LV, by 
their membership functions u(x), where i indicates the linguistic variable, i = 1, 
..., n, j indicates the term of the linguistic variable i, j = 1,..., m(i), and m(i) 
is the number of terms of the linguistic variable i. The number of linguistic vari- 
ables and the number of terms of each linguistic variable determine the number 
of possible rules. In most applications, certain states can be neglected either 
because they are impossible or because a control action would not be helpful. It 
is therefore sufficient to write rules that cover only parts of the state space. 

The rules connect the input variables with the output variables and are based 
on the fuzzy state description that is obtained by the definition of the linguistic 
variables. Formally, the rules can be written as 


rule r: if x, is A and x, is A? and... and x, is A", then u is A’ 


where A/' is the jth term of linguistic variable i corresponding to the membership 
function 1/(x;) and A’ corresponds to the membership function w(u) representing 
a term of the control action variable. A reasonable rule in the heating system 
example is 


if temperature is low and change_of_temperature is negative small, 
then power is medium 
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Table 11-1. Rule base. 


temp/change_te nb ns Z ps pb 
vl b b m m 
l b m m S S 

c m S S 

h S S S 

vh m S S 


The rule base in systems with two inputs and one output can be visualized by a 
rule table where the rows and columns are partitioned according to the terms of 
the input variables and the entries are the rule consequences. Assume that we 
have defined five terms of the linguistic variable “change_of_temperature”: “neg- 
ative big” (nb), “negative small” (ns), “zero” (z), “positive small” (ps), “positive 
big” (pb), and three control action terms for the “power”: “small” (s), “medium” 
(m) and “big” (b). A possible rule base is then visualized in table 11-1. Empty 
entries refer to states with no explicitly defined rules. The first empty entry (vl, 
nb) in table 11-1 refers to a state where the temperature is very low and falling 
rapidly. Since the heating system has limited power, even maximal power would 
not lead to a comfortable temperature. A rule that covers this situation is there- 
fore superfluous. One should, however, define a default value that is used as a 
controller output if neither of the rules fires. 

The definition of linguistic variables and rules are the main design steps when 
implementing a Mamdani controller. Before elaborating on the last design step, 
which is the choice of an appropriate defuzzification procedure, we show how 
input values trigger the computation of the control action. The computational core 
can be described as a three-step process consisting of 


1. determination of the degree of membership of the input in the rule- 
antecedent, 

2. computation of the rule consequences, and 

3. aggregation of rule consequences to the fuzzy set “control action.” 


The first step is to compute the degrees of membership of the input values in the 
rule antecedents. Employing the minimum-operator as a model for the “and,” we 
compute the degree of match of rule r as 


Q, = MIN ;z,...,n fu} (x™ )} 
This concept enables us to obtain the validity of the rule consequences. We 
assume that rules with a low degree of membership in the antecedent also have 
little validity and therefore clip the consequence fuzzy sets at the height of the 
antecedent degree of membership. Formally, 
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u (u) = min{o,, u’ (u)} 


The result of this evaluation process is obtained by aggregation of all conse- 
quences using the maximum operator. We compute the fuzzy set of the control 
action: 


psonsea (u) — max, { ysonsea (u)} 


This computation is a special case of an inference process described in chapter 
10, and other inference methods can be applied. It is important to note that 
Mamdani’s method takes into account all rules in a single stage and that no chain- 
ing occurs. Thus the inference process in fuzzy control is much simpler than in 
most expert systems. 

In our heating system example, we assume that the current temperature is 22°C 
and that the change_of_temperature is —0.6°C/min. Thus we get that temperature 
is “comfortable” with degree 0.4 and “high” with degree 0.3 (see figure 11-3). 
A similar definition of the linguistic variables in the change_of_temperature case 
yields “negative small” with degree 0.6 and “zero” with degree 0.2. In table 11-1, 
we see that four rules have a degree of match greater than zero: 


r10: if temp = “comfortable” and change_of temp = “negative small,” then power 
= “medium” 


r11: if temp = “comfortable” and change_of temp = “zero,” then power = “small” 


r13: if temp = “high” and change_of temp = “negative small,” then power = 
“small” 


r14: if temp = “high” and change_of temp = “zero,” then power = “small” 


The degree of membership is 
Oyo = min{0.4, 0.6} = 0.4 
œ; = min{0.4, 0.2} = 0.2 
O43 = min{0.3, 0.6} = 0.3 
O14 = min{0.3, 0.2} = 0.2 
Accordingly, the consequences of the rules are 
bio (u) = min{0.4, pw" (u)} 
uir (u) = min{0.2, uw" (u)} 
uis? (u) = min{0.3, u™™ (u)} 
bia" (u) = min{0.2, u™™ (u)} 
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Figure 11-4. Rule consequences in the heating system example. 


Figure 11—4 depicts the resulting fuzzy set of control action 


ua (u) = max {uso (u), wit" (u), Us (u), wis (wf 


11.4.2 Defuzzification 


Since technical processes require crisp control actions, a procedure that generates 
a crisp value out of one or more given fuzzy (output) sets is required. These 
defuzzification methods are very often based on heuristic ideas, such as, “take the 
action that corresponds to the maximum membership”, “take the action that is 
midway between two peaks or at the center of the plateau”, etc. Of course, these 
methods can also be characterized by their formal (mathematical) properties. Also, 
defuzzification is not only relevant for fuzzy control but also for other types of 
problems, e.g. multi criteria analysis (see chapter 14) and other areas in which 
fuzzy sets have to be transformed into crisp expressions (real numbers, symbols, 
etc.). We discuss it here in the context of fuzzy control because historically it 
became first relevant in this context. 

In this book we will describe and discuss the best known defuzzification strate- 
gies and analyze their main properties and interrelationships. For many other 
defuzzification approaches that exist, the reader is referred to references where 
they are discussed in detail. (See, for instance, [Lee 1990; Runkler and Glesner 
1993, 1994; Driankov 1993; Yager and Filev 1994; Yager 1996; Runkler 1996; 
Li 1996; van Leekwijck and Kerre 1999}). 

The crisp value to be chosen should generally be an element of the supports 
of the fuzzy sets to be defuzzified. The criteria, however, which are used to find 
this element can depend on very different bases: it can be the type of inference 
of which the fuzzy set is a result of (see [Li 1996], it can be special points of the 
membership functions (e.g. maxima or minima), it can be the area below the 
membership functions or it can be other indicators. 
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In decision making, for instance, we want to achieve semantical correctness. 
This means to define “characteristic” or “significant” elements which are proba- 
bly those that have highest membership (maxima) [Runkler and Glesner 1994]. 
In fuzzy control we are looking for the most important rule base entry which 
might require to take into consideration weights of the rules etc. 

Other criteria for the choice of the defuzzification method is the scale level on 
which the membership function is available (see chapter 16). 

In the following we will first describe some elementary and some extended 
defuzzification methods and then compare them with respect to their properties. 


Extreme Value Strategies. These defuzzification strategies use extremal values 
of the membership function (generally the maxima) to define the crisp equiva- 
lent value. Let us assume that the membership function is not unimodal (have a 
unique maximum) but either have several maxima with the same value of u(x) 
or a “core”, i.e. a compact subset of the support in which the degree of mem- 
bership has the maximum value (a plateau as maximum). Depending on whether 
the left, the right end or the center of the “core” is considered most appropriate 
for defuzzification, one arrives at one of the following strategies: 


Left of maximum (LOM) 
Right of maximum (ROM), or 
Center of maximum (COM). 


Definition 11-1 
The core of a fuzzy set is defined as 
Co(x)={x|xeX and Aye X)\(A(y)> A(x))} 

Then for the LOM-strategy the defuzzified value is 

Urom = min{ulu € Co} 
For the ROM-strategy it is 

Urom = Max{ulu € Co} 
and for the COM-strategy it is 


Urom — ULOM 


2 


This should not be confused with the “Mean of Maxima” (MOM) strategy, which 
assumes that there is not a core of the fuzzy sets but separate different maxima. 
Figure 11-5 depicts the above three strategies for our example. 


Ucom = 
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Figure 11-5. Extreme value strategies. 


Controid Strategies (Area Methods). The information taken into account in 
above strategies is very limited. If more information shall be considered, which 
is available via the membership function of the fuzzy set to be defuzzified, then 
one normally resorts to centroid strategies. The best-known of these are the 
“center of areas” and the “center of gravity” strategies. 


Center of Area. The COA method chooses the control action that corresponds 
to the center of the area with membership greater than zero. The idea of this 
method is to aggregate the information about possible control actions that is rep- 
resented by the membership function. The solution is a compromise, due to the 
fuzziness of the consequences. Formally, the control action is computed as: The 
defuzzified value is the support element that divides the area below a continuous 
membership function into two equal parts. 


T u(x)dx = T u(x)dx 
Xmin dCOA 


The procedure can be computationally complex and can lead to unwanted results 
if the fuzzy set is not unimodal. The result of the COA defuzzification for the 
heating system example is depicted in figure 11-6. 

The center of gravity (COG) method is the most trivial weighted average and 
has a distinct geometrical meaning, that is the center of gravity or center of mass. 
From a mathematical point of view the COG corresponds to the expected value 
of probability. It is defined as 


fu - u(u)du 


Ucog = “r 
COG fu (udu 


u 
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Figure 11-6. COA defuzzification. 


All those defuzzification strategies might lead to problems if the fuzzy set to be 
fuzzified is not compact, 1.e. if it really consists of several fuzzy sets in between 
of them there are “forbidden” zones. These are intervals of the action space which 
do not belong to the support and from which no element should be chosen as a 
defuzzified action. This can, for instance, happen if a car approaches an obstacle 
and two possible (fuzzy) strategies are: “turn slightly right” and “turn slightly 
life”. The defuzzified strategy would most likely be “go straight ahead”, which 
is obviously not very desirable. 


Example 11-1 [Runkler and Glesner 1993] 


Let us assume a heating system which can be run at high or low degrees (but not 
in-between). The total range (universe) is u = [0,255], and the two relevant rules 
of the inference engine have weights of h and (1 — h). 

We shall consider two situations for changing weights: neighboring and 
separate membership functions. 

Since there is no unique maximum LOM, ROM and COM would only consider 
the “core” and would, therefore, always stay in “low” for h > (1 — h) and in “high” 
for h < (1 —h). For h = .5 the defuzzified values would be extremely different for 
LOM and ROM. They would in any case not change continuously with h. 

For COA and COG they would, for h = .5, even be at 127, certainly a not very 
desirable value. 


Let us now consider the situation shown in figure 11-8. 

The range of 63 < u < 127 is the “forbidden zone”. 

For LOM, ROM we would at least stay off the forbidden zone, but for COM 
(and for h = .5) we would certainly end up in it. For COA and COG we would 
also find defuzzified values in the forbidden zone for large ranges of h. 
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high 





Figure 11-8. Separate membership functions. 


These undesirable effects can be avoided by using parameterized defuzzifica- 
tion strategies, such as “Extended Center of Area” (XCOA) or “Extended Center 
of Gravity” (XCOG). 

Exemplarily we will show the XCOA strategy [Runkler and Glesner 1993]: 


J uc)” du 


S1 
yXCOA = 


J ue) du 


where S1 and S2, respectively, are the supports of the two fuzzy sets. 
This strategy reduces to 
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Mean Value of supports for a=0 
COA for a=1 
MOM for Q@ — œ 


For the situation of example 11-1 XCOA jumps for low values of œ from 191 
to 63. For high values of o it behaves as MOM. For medium & the defuzzified 
value slowly slides to the edge of the forbidden zone and then jumps over it to 
the opposite edge of the forbidden zone. It never lies in it! 


Scale levels and properties of defuzzifiers 


Obviously for nominal scale levels of the universe (see type A membership model 
in chapter 16) a defuzzification does not make sense at all. The first scale level 
from which a defuzzification makes sense at all is an ordinal scale level of the 
universe. Generally a cardinal scale level (interval, ratio or absolute scale) would 
have to be required. 

For the membership functions there are similar requirements. The views, 
however, on which scale levels membership functions are supplied in practice 
diverge considerably (see also chapter 16). 

For the defuzzifying strategies some authors [Runkler 1996; Li 1996; van 
Leekwijck and Kerre 1999] have also various desirable properties. 

From all these suggested we will select in the following the most important 
ones and those with respect to which the defuzzification strategies we have dis- 
cussed differ at all: 


Property 1: Closed Property 
The defuzzified value of a fuzzy set should be an element of its support. 


Property 2: Fuzzy Singleton 
If a fuzzy set has a positive degree of membership for only one element, then the 
defuzzification should select this element. 


Property 3: Horizontal Movement 
If a fuzzy set is shifted horizontally by a distance d, the defuzzified value should 
make the same movement. 


Property 4: (Strong) Monotony 

Monotony in this context means, that, if D(A) is the defuzzified value of the fuzzy 
set A, and the degrees of membership are increased on one side of D(A), then 
D(A) should move to this side. 


Property 5: Balance 
If a fuzzy set is enlarged or reduced on both sides of D(A), then D(A) should not 
change. 


238 FUZZY SET THEORY—AND ITS APPLICATIONS 


Table 11-2. Properties of defuzzifiers. 


Property 

Strategy I 2 3 4 5 6 7 8 9 10 
LOM Y Y Y Y No Y No No Yes No 
ROM Y Y Y Y No Y No No Yes No 
COM No Y Y Y No Y Y No Yes No 
COA No Y Y Y Y No Y No No Y 
COG No Y Y Y No No Y No No Y 

Y Y Y Y No Y No No Y 


XCOA Y 


Property 6: Strong Vertical Translation 
The defuzzified value stays unchanged if a constant is added to all membership 
values. 


Property 7: Equality 

If two convex fuzzy sets A and B have the same level center curves, then they 
should have the same defuzzified value. Here “level center curves” are curves 
that divide each o-level of a fuzzy set in two equal parts. 


Property 8: T-norm property 

If two fuzzy sets A and B are combined by a t-norm, then the defuzzified value 
of A € B should be in the interval bounded by the defuzzified values of the two 
fuzzy sets. 


Property 9: 7-conorm property 

If two fuzzy sets A and B are combined by a t-conorm, the defuzzified value of 
this combined fuzzy set should be in the interval bounded by the defuzzified 
values of A and B. 


Property 10: Continuity 
A small variation in any of the degrees of membership should not result in a big 
change of the defuzzified value. 


Table 11—2 shows which of the described defuzzification strategies has which 
property. 

So far we have discussed formal mathematical properties which can be valu- 
able when deciding which defuzzification should be used. In addition, however, 
other criteria may turn out to be important. 


1. Computational Effort. Is the method slow or fast when implemented as 
an algorithm? Does the situation require a fast method (for instance, in on- 
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line embedded control) or is time not a relevant dimension (e.g. often in deci- 
sion making)? 


2. Inference. Do we want the defuzzification to take into consideration the 
type of inference we are using or shall it even adapt to changes in the 
inference engine? 


3. Plausibility. Does the defuzzification method yield a plausible control 
action and is it stable or oversensitive? 


Other criteria are possible (see, e.g., Driankov et al. [1993] and Pfluger, Yen, and 
Langari [1992]) and depend on the application under consideration. The choice 
of an appropriate defuzzification procedure can therefore be compared to the 
choice of an appropriate aggregation operator as discussed in chapter 3. 


11.4.3. The Sugeno Controller 


An often-used modification of Mamdani’s controller was presented by Sugeno 
[1985b] and Sugeno and Nishida [1985]. The idea is to write rules that have fuzzy 
antecedents, equivalent to the Mamdani controller, and crisp consequences that 
are functions of the input variables. The rule results are aggregated as weighted 
sums of the control actions corresponding to each rule. The weight of each rule 
is the degree of membership of the input value in the rule antecedent as com- 
puted in the Mamdani controller. A defuzzification procedure is therefore super- 
fluous. A rule can formally be written as 


rule r: if x, is Aj! and x is A? and... and x, is A”, then u is 
FO, X25 6 Xn) 


where the variables are defined as in the Mamdani case. The consequence func- 
tion, which depends on the input variables, is usually linear, but other types may 
be used. In the heating system example, we may write a rule like 


if temperature is low and change_of_temperature is negative small 
then power = 400 — 120 - temp-23 - delta_temp [W] 


The definition of a functional relationship is not straightforward but allows the 
identification of parameter values in the consequence function. 

The control action is computed with the help of the degrees of membership 
that are evaluated exactly as in the Mamdani controller. We obtain 


ya, e fr (X15 X25 +065 Xn) 


Sugeno _ _ 7 


u = Sa, 


r 
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It is possible to view the linear Sugeno controller as a linear controller that is 
valid around a fuzzily defined operating point. The control algorithm in the oper- 
ating point is perfectly valid and loses validity with decreasing degree of mem- 
bership, which is computed with the help of the rule antecedents. Thus the control 
strategy 1s a combination of several linear control strategies defined at different 
points in the state space. 


11.5 Design Parameters 


The design of a fuzzy controller involves decisions about a number of important 
design parameters that can be determined before the actual control starts and/or 
on-line. Important design parameters are the fuzzy sets in the rules, the rules 
themselves, scaling factors in input and output, inference methods, and defuzzi- 
fication procedures. Although other design parameters also play important roles, 
we want to focus on the parameters that have to be defined in almost all control 
applications. Defuzzification has already been discussed thoroughly and infer- 
ence is discussed in connection with expert systems (chapter 10). 


11.5.1 Scaling Factors 


The easiest-to-change parameters are the scaling factors. The scaling factors scale 
the base variables of the linguistic variables. Formally, the input and output vari- 
ables are calculated as x; = sf;- x; where the x; is the variable that is used in the 
rule and sf; is the scaling factor of rule i. Scaling factors allow the definition of 
normalized base variables of the corresponding linguistic variables and play a 
role similar to the gain in conventional control systems. It is obvious that alter- 
nation of the scaling factors has a significant impact on the closed loop behavior 
of an FLC system. 


11.5.2 Fuzzy Sets 


The fuzzy sets describe terms of linguistic variables. When the shape of the fuzzy 
sets are determined, several other parameters have to be adjusted. Here, we will 
assume that the membership functions have a triangular shape, which by no 
means is necessary but is often done in fuzzy control applications. The modal 
value or peak value of a membership function is the value of the base variable 
where the membership function is equal to one. The left and right width of the 
membership function is the first value of the base variable on the left or right side 
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Figure 11-9. Parameters describing the fuzzy sets. 


of the peak value, respectively, that has a zero membership. The cross point 
between two membership functions is the value of the base variable where both 
membership functions assume the same membership value greater than zero. The 
cross point level is the membership at the cross point. Clearly, two membership 
functions may have more than one cross point. We therefore define the cross point 
ratio as the number of cross points between two membership functions. Figure 
11-9 depicts a linguistic variable with three fuzzy sets and the corresponding 
parameters. 

Several rules of thumb can be formulated using the above definitions. The 
reader should, however, be aware of the empirical character of these rules, 1.e., 
there are no globally valid proofs showing their validity. A common rule claims 
that all values of the base variable should have a membership greater than zero 
in at least one membership function corresponding to one of the terms. It is also 
usual to demand that two adjacent membership functions interact, i.e., that the 
crosspoint ratio is equal to one for those membership functions. It is therefore 
often assumed that the cross point value between neighboring membership func- 
tions is equal to one and that the cross point level is 0.5 [Driankov et al. 1993, 
p. 120]. 

Next, we will focus on symmetry, which is achieved if the left and the right 
width are equal. Assume that we have designed a fuzzy controller with a single 
input, a single rule with a one-term linguistic variable in the consequence, and 
COA defuzzification. Then the Mamdani controller will clip the membership 
function of the rule consequence in the height of the membership function in 
the rule antecedent. If the input matches the rule antecedent with membership 
one, then we would expect to get the peak value of the rule consequence. This 
would only be the case if the membership function of the rule consequence is 
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Figure 11—10. Influence of symmetry. 





Figure 11-11. Condition width. 


symmetrical. This dependency is shown in figure 11-10 for a non symmetrical 
fuzzy set in the rule consequence. 

The condition width states that the left-width of the right membership func- 
tion is equal to the right-width of the left membership function and that they are 
both equal to the length of the interval between the peak values of the two adja- 
cent membership functions [Driankov et al. 1993, p. 122]. This rule yields 
smoothly changing control values and avoids large steps. A linguistic variable 
that satisfies this condition is shown in figure 11—11. 


11.5.3 Rules 


The entire knowledge of the system designer about the process to be controlled 
is stored as rules in the knowledge base. Thus the rules have a basic influence on 
the closed-loop behavior of the system and should therefore be acquired thor- 
oughly. The development of rules may be time-consuming, and designers often 
have to translate process knowledge into appropriate rules. Sugeno and Nishida 
[1985] mention four ways to find fuzzy control rules: 
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1. the operator’s experience 

2. the control engineer’s knowledge 

3. fuzzy modeling of the operator’s control actions 
4. fuzzy modeling of the process 


We add the following sources that may also be used: 


5. crisp modeling of the process 
6. heuristic design rules 
7. on-line adaptation of the rules 


Usually a combination of some of these methods is necessary to obtain good 
results. As in conventional control, increased experience in the design of fuzzy 
controllers leads to decreasing development times. 


11.6 Adaptive Fuzzy Control 


Many processes have time-variant parameters due to continuous alternation of 
the process itself. This well-known phenomenon has led to the development of 
adaptive controllers that change their control behavior as the process changes. 
This adjustment is called adaptation. It is natural for adaptive fuzzy controllers 
to change the same controller parameters that a designer may change. Therefore 
most adaptive FLC systems change the shape of the membership functions, the 
scaling factors, etc. It is common to distinguish between controllers that modify 
their rules; these are called self-organizing controllers [Procyk and Mamdani 
1979], and self-tuning controllers [e.g., Bartolini et al. 1982] that modify essen- 
tially the fuzzy set definitions. Since adaptive controllers work automatically, a 
monitor has to be found that detects changes in the process. Two common 
methods can be distinguished: 


1. The performance measure approach, where the closed-loop behavior is eval- 
uated by certain performance criteria such as overshoot, rise-time, etc. 

2. The parameter estimator approach, where a process model is continuously 
updated due to sampled process information. 


It is usually easier to define appropriate performance measures than to find 
process models that can be updated continuously and that are valid over a wide 
range of the state space. An overview of the area of adaptive fuzzy controllers is 
given by Driankov et al. [1993], and researchers continue to work actively in the 
field. Popular design methods currently include the combination of fuzzy con- 


244 FUZZY SET THEORY—AND ITS APPLICATIONS 


trollers with neural network methods [e.g., Berenji 1992; Berenji and Khedar 
1992] and genetic algorithms [e.g., Hopf and Klawonn 1993; Lee and Tagaki 
1993]. 


11.7 Applications 


Fuzzy control certainly is the branch of fuzzy set theory with the most applica- 
tions, and their number is steadily growing. The application boom was started by 
Japanese manufacturers who applied fuzzy logic to processes ranging from home 
appliances to industrial control. The first major book containing applications of 
FLC was edited by M. Sugeno [1985a] and shows that the term “fuzzy control” 
is not narrowly interpreted as applications of the Mamdani or Sugeno controller 
to a certain process but includes other fuzzy logic techniques such as fuzzy 
algebra as well. It is also worthwhile to mention that most successful applica- 
tions combine FLC systems with conventional control strategies to hybrid 
systems. 

We now present several applications of fuzzy control without going into detail. 
Interested readers may consult the original literature. 


11.7.1 Crane Control 


Cranes are widely used in industrial assembly systems where heavy loads have 
to be transported. Today, modern cranes reach a top speed of 160 m/min and an 
acceleration of up to 2m/s” [Behr 1994]. A container crane is depicted in figure 
11-12. One of the main problems that have to be taken into account in the control 
of such a crane system is that the load may start to swing. This can be avoided 
with the help of mechanical constructions such as telescopes and stays or elec- 
tronic loss control. These methods are, however, expensive, and the construction 
depends on the specific crane under consideration. In contrast, it was observed 
that an experienced operator was able to control a crane satisfactorily without 
such advanced devices. This was the motivation for the design of an FLC system 
for crane control. 

The crane control depends on the mode of operation: one distinguishes 
between manual operation, where an operator controls the crane and the objec- 
tive of the fuzzy controller is to avoid swinging, and automatic operation, where 
a certain position has to be reached. Here we focus on automatic operation. 

The automatic operation mode can be divided into three different phases of 
motion: acceleration, normal motion, and positioning. Figure 11—13 depicts the 
typical behavior of the speed in the different phases. 
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Figure 11-12. Container crane [von Altrock 1993]. 
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Figure 11-13. Phases of motion. 


Different controllers were designed for the three phases. Input values were the 
position, the speed, the length of the pendulum, the angle of the pendulum, and 
in some cases the mass of the load. When the mass was unknown, a fuzzy esti- 
mator system was activated that calculates the mass as a function of the observed 
system behavior. The controllers were implemented on a fuzzy processor for real- 
time control of the crane. 
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Figure 11-14. Input variables [Sugeno and Nishida 1985, p. 106]. 


11.7.2 Control of a Model Car 


One of the most difficult processes to control with conventional control methods 
is a car. The mathematical models are large and nonlinear, and simple controllers 
such as PID controllers do not yield satisfactory results. Most people can, 
however, drive a car without any mathematical model, and it is clear that they 
use their knowledge to control the car. 

Sugeno and Nishida [1985] were the first to implement and publish the results 
they obtained with a fuzzy-controlled model car. The fuzzy control rules were 
derived by modeling an expert’s driving actions. Four input variables were used: 
x, = distance from entrance of corner, x, = distance from inner wall, x, = direc- 
tion (angle) of car, and x, = distance from outer wall. The four variables are 
depicted in figure 11—14. 

These four input variables are used as inputs to a Sugeno controller with 20 
rules. The results were very encouraging and are depicted in figure 11-15. It is 
worthwhile to mention that all rules were derived from an experienced driver’s 
control actions with an identification procedure. 

Whereas the study by Sugeno and Nishida treated static problems von Altrock 
et al. [1992] considered the control of a model car in extreme situations that are 
inherently dynamic. Typical dynamical situations are sliding and skidding. The model 
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Figure 11-15. Trajectories of the fuzzy controlled model car [Sugeno and Nishida 
1985, p. 112]. 
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Figure 11-16. Fuzzy model car [von Altrock et al. 1992, p. 42]. 


car has a one-horsepower electric motor and can accelerate to 20 mph in 3.5 seconds. 
Furthermore it has advanced features such as individual wheel suspension, disk 
brakes, and differential and shock absorbers. Three polaroid sensors are used for orien- 
tation (front, left, and right), and additional infrared sensors are mounted in each 
wheel to measure the individual speed. The model car is shown in figure 11-16. 

Since the conventional Mamdani max-min operators were not sufficient in this 
case, compensatory operators such as the y-operator were used (see chapter 3). 
Another modification was the introduction of “rule weights” that are used to 
describe the plausibility of each rule. The objective of the car was to reach a target 
as fast as possible without hitting the walls or any obstacle. A typical experi- 
mental design is depicted in figure 11-17. 

Most of the results were very encouraging. However, in some situations the 
car lost its orientation due to the limited information obtained from the sensors. 
This can only be avoided if some sort of memory is used to compute the current 
orientation [cf. von Altrock et al. 1992, p. 48]. 


11.7.3. Control of a Diesel Engine 


Murayama et al. [1985] designed a fuzzy controller for a marine diesel engine. 
The objective here was to minimize the fuel consumption rate (FCR). The engine 
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Figure 11-17. Experimental design [von Altrock et al. 1992, p. 48]. 


is controlled by fuel flow rate (Q), fuel injection timing (U), fuel injection dura- 
tion (7), and inner pressure of the fuel pipe (P). Special attention was paid to the 
fuel injection timing, which influences the FCR directly. Figure 11—18 depicts the 
FCR as a function of the fuel injection timing. 

Since the data are noisy, gradient methods cannot be employed directly. 
Therefore the authors use an adaptive method to verify the results obtained by 
the gradient search. Fuzzy numbers and an adjustment method that uses a fuzzy 
set to assess the credibility of the computed results are employed. The control 
algorithm is depicted in figure 11-19. 

No rules are used to calculate the actual control output as in the Mamdani and 
the Sugeno controller. Therefore one may also consider this application as an 
application of fuzzy data analysis to a control problem. The results that were 
obtained with this simple method were, however, very encouraging. The fuzzy 
control method outperformed the conventional method clearly, as is shown in 
figure 11-20. 


11.7.4 Fuzzy Control of a Cement Kiln 


In this case, we will consider a physical process as the object of control. Let us 
first describe briefly the process itself [King and Karonis 1988, pp. 323]. 
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Figure 11-18. FCR vs. fuel injection timing [Murayama et al. 1985, p. 64]. 


Cement is manufactured by heating a slurry consisting of clay, limestone, sand, 
and iron ore to a temperature that will permit the formation of the complex com- 
pounds of cement, dicalcium silicate (C,S), tricalcium silicate (C3S), tricalcium 
aluminate (C3Al), and tetracalcium aluminoferrite (C,AIF). In the first stage of 
the kilning process, the slurry is dried and excess water is driven off. In the second 
stage, calcining takes place, with the calcium carbonate decomposing to calcium 
oxide and carbon dioxide. In the final stage, burning takes place at 1,250—1,450°C, 
and free lime (CaO) combines with the other ingredients to form the cement com- 
pounds. The end product of the burning process is referred to as clinker. 

The kiln consists of a long steel shell about 130m in length and 5m in diam- 
eter. The shell is mounted at a slight inclination to the horizontal, and is lined 
with fire bricks. The shell rotates slowly, at approximately 1 rev/min, and the 
Slurry is fed in at the upper or back end of the kiln. The inclination of the shell 
and its rotation transports the material through the kiln in about 3 hours 15 
minutes with a further 45 minutes spent in the clinker cooler. 

The heat in the kiln is provided by pulverized coal mixed with air, referred to 
as primary air. The hot combustion gases are sucked through the kiln by an induc- 
tion fan at the back end of the kiln [Umbers and King 1981, p. 370]. 
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Figure 11-19. Control algorithm [Murayama et al. 1985]. 
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Figure 11-20. Experimental results [Murayama et al. 1985]. 
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Figure 11-21. Schematic diagram of rotary cement kiln [Umbers and King 1981, 
p. 371]. 


Figure 11—21 illustrates the production process. The main problem in mathe- 
matically modeling a control strategy is that the relationships between input vari- 
ables (measured characteristics of the process) and control variables are complex 
and nonlinear and contain time lags and inter-relationships; in addition, the kiln’s 
response to control inputs depends on the prevailing kiln conditions. These were 
certainly reasons why a fuzzy control system was designed and used—which 
eventually even led to a commercially available fuzzy controller. 
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From the many possible input and control variables, the following were chosen 
as particularly relevant. Input variables include 


exhaust gas temperature—back-end temperature (BT); 
intermediate gas temperature—ring temperature (RT); 
burning-zone temperature (BZ); 

oxygen percentage in exhaust gases (O,); and 

liter weight (LW)—indicates clinker quality. 


WB wWN 


The process is controlled by varying the following control variables: 


1. kiln process (KS); 
2. coal feed (CS)—fuel; and 
3. induced draught-fan speed (BF). 


The calculation of the control action was composed of the following four 
stages: 


calculate the present error and its rate of change; 

convert the error values to fuzzy variables; 

evaluate the decision rules using the compositional rule of inference; and 
calculate the deterministic input required to regulate the process. 


AUNG 


Concerning the control strategies used, let us quote Larsen: 


The aim of the computerized kiln control system is to automate the routine control 
strategy of an experienced kiln operator. The applied strategies are based on detailed 
studies of the process operator experiences which include a qualitative model of influ- 
ences of the control variables on the measured variables [Larsen 1981, p. 337]. 


1. If the coal-feed rate is increased, the kiln drive load and the temperature in 
the smoke chamber will increase, while the oxygen percentage and the free 
lime content will decrease. 

2. If the air flow is increased, the temperature in the smoke chamber and the 
free lime content will increase, while the kiln drive load and the oxygen per- 
centage will decrease. 


On the basis of thorough discussions with the operators, Jensen [1976] defined 
75 operating conditions as fuzzy conditional statements of the type: 


IF drive load gradient is (DL,SL,OK,SH,DH) 

AND drive load is (DL,SL,OK,SH,DH) 

AND smoke chamber temperature is (L,OK,H) 

THEN change oxygen percentage is (VN,N,SN,ZN,OK,ZP,SP,P, VP) 


PLUS change air flow is (VN,N,SN,ZN,OK,ZP,SP,P, VP) 
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The following fuzzy primary terms are used for the measured variables: 


1. DL = drastically low 5. SH =slightly high 

2. L=low 6. H = high 

3. SL=slightly low 7. DH = drastically high 
4. OK=ok 


The following fuzzy primary terms are used for the control variables: 


1. VN = very negative 6. ZP = zero positive 
2. N= negative 7. SP = small positive 
3. SN = small nagative 8. P = positive 

4. ZN = zero negative 9. VP = very positive 
5. 


OK = ok 


The linguistic terms are represented by membership functions with four discrete 
values in the interval [0, 1] associated with 15 discrete values of the scaled vari- 
ables in the interval [-1, +1]. 

In order to simplify the implementation of the fuzzy logic controller, 
Ostergaard [1977] defined 13 operating conditions as fuzzy conditional statements 
of the type: 


IF drive load gradient is (SN,ZE,SP) 
AND drive load is (LN,LP) 

AND free lime content (LO,OK,HI) 
THEN change burning zone temperature (LN,MN,SN,ZE,SP,MP,LP) 
The following fuzzy primary terms are used: 

1. LP = large positive 7. SN =small negative 

2. MP= medium positive 8. MN = medium negative 
3. SP = small positive 9. LN = large negative 

4. ZP = zero positive 10. HI = high 

5. ZE = zero 11. OK = ok 

6. ZN = zero negative 12. LO = low 


The 13 operating conditions are defined by taking only some of the combina- 
tions into account, and by including also the previous values of the drive load 
gradient, the latter being calculated from the changes in the drive load. In order 
to decide whether the oxygen percentage set point or the air flow should be 
changed, three additional fuzzy rules for each operating condition are formulated 
based on the actual values of the oxygen percentage and the smoke chamber tem- 
perature, resulting in 39 control rules. 

Details of membership functions used can be found in Holmblad and 
Ostergaard [1982] and results of testing the system in Umbers and King [1981] 
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and Larsen [1981]. We shall not describe these details here, primarily because 
they are not of high general interest. 

Before we turn to a quite different type of control, it should be mentioned, 
however, that the reader can find descriptions and references to more than 10 
further projects of the type described here in Mamdani [1981], in Pun [1977], 
and in Sugeno [1985a]. 


11.8 Tools 


Fast and easy implementation of control systems requires adequate tools that 
assist the system designer in the design and coding, which would be time- 
consuming if performed by hand. An increasing number of tools exist both for 
conventional and fuzzy logic control. Modern tools use graphical animation and 
offer interactive on-line development capabilities instead of precompiling. Pre- 
compiler tools precompile the linguistically designed controller into a code, e.g., 
in C. This can then be combined with other codes. Then the controller is started 
and the closed-loop behavior is observed. If the behavior isn’t sufficient—which 
usually is the case—the control is interrupted and a new, modified, controller is 
defined and precompiled. This controller is linked to the process and so on. This 
method is inefficient and time-consuming, since every modification implies inter- 
ruption of the control and compiling and linking. 

The interactive approach is much more efficient because the designer is 
enabled to study the direct consequences of modifications of design parameters 
such as rules and fuzzy sets. Here we shall consider, as an example, the fuzzy 
TECH design tool by INFORM [Inform 1995]. Figure 11-22 shows the devel- 
opment philosophy of fuzzy TECH. 

This tool runs on most hardware platforms and can be used for on-line opti- 
mization of a fuzzy control system. 

The system introduces the concept of “normalized rule bases” that makes even 
large rule bases easy to comprehend. A screenshot of a rule base for the model 
car [von Altrock et al. 1992] is shown in figure 11-23. 

The whole inference process is visualized in different windows on-line, and 
auxiliary screens visualizing the phase plane and transfer characteristics help the 
designer in tracing erroneous rules or term definitions. Figure 11—24 shows the 
simulation screen of the model car presented at the FUZZ-IEEE conference in 
1992. 

We sum up by stating that FLC design is accelerated and made more efficient 
by the use of modern graphical development tools. Such tools can also be used 
effectively for training purposes in connection with simulation models or labora- 
tory processes. 
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Figure 11-22. Controller development in fuzzy TECH [von Altrock et al. 1992]. 
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Figure 11-23. Rule base for model car [von Altrock et al. 1992]. 


FUZZY CONTROL 257 





Figure 11-24. Simulation screen [von Altrock et al. 1992]. 


11.9 Stability 


Stability and performance of the closed-loop system are considered by many 
control engineers to be the main criteria assessing the quality of a control system. 
In many cases it is desirable to prove the stability of the controlled system. It is, 
of course, only possible to prove the stability of the process model and not of the 
process itself; however, stability can often be proved for a wide range of model 
parameters, and the risk of instability can thus be minimized. The lack of formal 
techniques for stability analysis has been a main point of criticism of FLC 
systems. There do, however, already exist many approaches to prove the stabil- 
ity of a closed-loop FLC system. 

When studying the stability of FLC systems, one has to use a model of the 
process that can be fuzzy or crisp. Most methods use crisp process models and 
conventional nonlinear control theory to prove stability. In this context, the fuzzy 
controller is considered as a nonlinear transfer element, i.e., the output is deter- 
mined as a function of the input variables, u = ®(r) [Kickert and Mamdani 1978]. 
Such a system is depicted in figure 11-25. Set-point values and noise can be 
neglected because stability is a system property. This means that the control action 
for a known input value can be derived by calculating the result of rule firing, 
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Figure 11-25. Fuzzy controller as a nonlinear transfer element. 


rule aggregation, and defuzzification. The problem is often to find a suitable rep- 
resentation of the fuzzy controller in this context. 

In the case of a nonlinear crisp process model, one can distinguish between 
time-domain and frequency-domain models [Bretthauer and Opitz 1994]. The 
time-domain models include the state-space approach, Ljapunov theory, hyper- 
stability theory, and the bifurcation theory approach. The class of frequency- 
domain methods include the harmonic-balance approach and the circle and Popov 
criteria. Figure 11—26 summarizes the different approaches. 

A graphical approach to stability analysis is the state-space approach, where 
the trajectory of the closed-loop system is displayed in the two-dimensional state 
space. Naturally, this approach is limited to two-dimensional systems. The main 
idea is to partition the space that is defined by the input base variables of the 
rules, which is called the linguistic state space, according to the terms of the 
linguistic variables. This leads to sections of the state space where the degree of 
membership of an input variable x; in one term—say, term k—is higher than the 
degree of membership in the other terms, i.e., u*(x;) = w(x;) for all j; # k;. Since 
the rule base was defined in terms of these input variables (see table 11-1), we 
can infer which term of the output variable is dominant in the corresponding 
sector of the state space. Figure 11—27 shows the linguistic state space that cor- 
responds to our heating system example. Note that every input consisting of a 
temperature and a change of temperature can be located in the state space. 

Suppose that we start the controller with an input temperature of 13 degrees 
and a change of temperature of —1° per minute. The controller starts the heating 
system with approximately medium power, and the temperature rises. Due to this 
control action, other regions of the state space are reached and other rules get 
dominant. The sequence of regions that are reached in the state space depends on 
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Figure 11-26. Classification of stability analysis approaches. 
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the fuzzy controller and the system to be controlled and is called the linguistic 
trajectory. A possible linguistic trajectory of the heating system example is 
depicted in figure 11-28. The corresponding linguistic trajectory is (l,nb), 
(1,ns),(1,z),(,ps),(c,ps),(h,ps),(h,z),(c,z) where the first entry is the term of the 
linguistic variable temperature and the second entry is the term of the linguistic 
variable change of temperature, e.g., (l,ns) means the region with low tempera- 
ture and negatively small change of temperature. The linguistic trajectory shows 
that the system reaches an equilibrium point, namely, (c,z), where the tempera- 
ture is comfortable and does not change. If an equilibrium point is reached for 
all possible starting configurations in the state space, then the system is stable. 
The state space approach has the advantage of being easy to understand and is 
of great help when designing a fuzzy controller, since the impact of rules can be 
seen directly in the state space. Some software tools offer the possibility of plot- 
ting the linguistic trajectory of the system on the computer screen. We close the 
discussion of this approach by noting that a system that reaches an equilibrium 
point in the linguistic state space may have underlying oscillations which cannot 
be detected by this method due to the coarseness of the partition induced by the 
membership functions of the terms of the linguistic variables. The heating system 
may, as an example, lead to temperatures varying between 18° and 19° Celsius 
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Figure 11-27. Linguistic state space. 


and small negative and positive changes of temperature if the power can only be 
adjusted discretely. The activated region in the state space would, however, 
always be (c,z). 

Since the introduction of the formal methods of FLC stability analysis requires 
a solid background in nonlinear control theory, a detailed discussion of the 
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Figure 11-28. Linguistic trajectory. 


approaches is not possible in this book. We limit ourselves to the specification of 
the different approaches and request interested readers to consult the literature. 
Topics and references include the following: controller as relay [Kickert and 
Mamdani 1978], limit theorems [Bouslama and Ichikawa 1992], fuzzy sliding 
mode control [Hwang and Lin 1992], Ljapunov theory [Langari and Tomizuka 
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1990; Tanaka and Sugeno 1992; Kiendl and Riiger 1993], harmonic balance 
[Kiend] and Riiger 1993], circle criterion [Ray and Majunder 1984], conicity cri- 
terion [Aracil et al. 1991], and vector fields [Aracil et al. 1988, 1989]. 

An overview of some of these approaches is found in Driankov et al. [1993], 
and a literature survey is given by Bretthauer and Opitz [1994]. 


11.10 Extensions 


Most of the basic problems of FLC have been resolved, and researchers are now 
investigating advanced topics such as stability, adaptive fuzzy control, hybrid 
systems, neuro-fuzzy systems, and FLC systems tuned by genetic algorithms 
(GAs) that are inherently adaptive systems. Progress is fast in these areas, and 
promising experimental results have been obtained. 

With the rising popularity of FLC, more engineers will be trained in this area 
in the future. This training will lead to more applications of FLC systems and to 
rising field experience of the involved engineers. Fuzzy logic control is an inte- 
gral part of modern control theory, not replacing conventional methods but rather 
complementing them. 

Since the literature in fuzzy control is too vast to be discussed in its entirety 
in this textbook, a summary is given below. It is primarily intended for those who 
have an extended interest in this area: 

One of the first books on fuzzy logic control was written by W. Pedrycz in 
1989 [Pedrycz 1989] and focuses on many concepts of FLC. The use of fuzzy 
relations in connection with FLC systems is discussed thoroughly. A second 
edition of this popular book appeared in 1993 [Pedrycz 1993] and covers also 
new directions, such as neural network methods. Many survey articles on FLC 
have appeared in control journals in the last years, and we very much recommend 
the survey of Lee [1990], which covers all basic aspects. The first major book on 
applications was the one edited by Sugeno [1985a]. Zimmermann and von. 
Altrock [1994] provide a more recent collection of applications, most of them 
describing German industrial projects. Jamshidi et al. [1993] also cover a wide 
area of different applications, including robotics and flight control, most of 
which have been realized in the United States. An interesting collection of the 
now-famous Japanese applications of fuzzy control is provided by Hirota [1993]. 
Many articles do describe practical implementations of FLC systems and can be 
found in journals covering mainly fuzzy sets as well as in journals on automatic 
control. From an engineer’s point of view, the book written by Driankov, 
Hellendoorn, and Reinfrank [1993] covers all major aspects of fuzzy control. A 
background in conventional control theory is, however, necessary to understand 
some of the chapters. 
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Exercises 


1. a. Draw the block diagram of a Mamdani/Sugeno controller and explain 
each function separately. 
b. What are the differences between the Mamdani and the Sugeno 
controller? 
2. Which design parameters can be varied in a fuzzy controller? 


3. A Mamdani controller has the following rule base: 
O eee e l o 
C i 


The linguistic variables are defined as follows: 










Error: 


u (error) 





negative positive 






-7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 error 
Change of error: 


u (change of error) 


negative positive 


-7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 change of 
error 
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Control action: 


u (control action) 
small 





12 3 4 5 6 7 8 9 10 11 12 13 14 control 
action 


a. Calculate the fuzzy set of control, when error = 2 and change of error = 4. 
b. Calculate the control action when 
(i) mean of maxima 
(ii) center of sums 
is used as a defuzzification procedure. 
4. Which operators can be varied in the Mamdani controller? Discuss the choice 
of operators in connection with fuzzy controllers. 


1 2 FUZZY DATA BASES 
AND QUERIES 


12.1 Introduction 


Data bases are one form of modeling parts of the real world. They may contain 
descriptions of technical systems, of enterprises, of scientific activities, of land- 
scapes (geographical information systems), or other domains. The world of data 
bases is the world of digital computers, one of the most typical dichotomous 
systems. It is, therefore, not surprising that the type of storage is crisp and that 
all data processing, e.g. input, storage, querying is crisp, no matter whether the 
factual relationships described in a database are crisp or uncertain or fuzzy. 

For approximately 20 years researchers around the world have been concerned 
with the use of fuzzy set theory to represent imprecision in data bases. This 
research has been hampered by the fast development of data base technology. 
From the graphtheoretic paradigm data base theory moved to relational databanks 
and on to object-oriented designs, each of these paradigms requiring different 
fuzzy approaches. This is probably one of the reasons why commercial realiza- 
tions of fuzzy databank technology lag behind the theory. 

In this book and chapter we cannot describe all existing fuzzy approaches in 
fuzzy databank technology (interested readers are referred to [Petry 1996; 
Bordogna and Pasi 2000; Pons et al. 2000]). We shall rather focus our attention 
exemplarily on relational databanks and on similarity based fuzzy models. 
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12.2 Fuzzy Relational Databases 


The relational data model is based on set-theoretic concepts. Essentially, rela- 
tional data bases consist of relations in two-dimensional (row and column) 
format. Rows are called tuples and correspond to records and columns are called 
domains or attributes and correspond to fields. One or more attributes are distin- 
guished as the key attributes. We will consider relations of the so-called “third 
normal form”, which possess two characteristics: first, each attribute fully 
depends on the entire key (and not part of it). Secondly, each of the non-key attrib- 
utes is non-transitively dependent on the key (i.e. they depend only on the key 
and not on each other). 


Example 12-1 


Let us consider a data base that describes materials which are supplied by 
different suppliers. The first table shows the suppliers together with their loca- 
tions, the material supplied and their evaluated quality. The second table contains 
again the suppliers and information about their delivery reliability and their costs 
and the third table describes the materials supplied. 


Suppliers 












802.025 
DEWAG | Paris | 802.020 | medium 


material | quality 
INFORM 802.025 
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Reliability 
supplier | material | reliability 
DEWAG 302.025 


DEWAG 802.020 
802.025 
802.025 







Materials 


material | deseription | standard | 


Access to a database via a query is normally based on relational algebra. This 
allows to manipulate and combine the relations or tables that the requested query 
results are provided. 

A relational algebra operation consists of an operation name, one or more rela- 
tion names, one or more domain names and an optional conditional expression. 
For example, an operation on the above relations might be: 






Select Companies where Material = EURO-NORM and Location = Paris 


which would result in: 


DEWAG in Paris. 
# 


So far all components in the relations were crisp. If this is not an adequate descrip- 
tion of reality, fuzzy rather than crisp relations might be used (see chapter 6). 
The fuzziness of such a relation can either be modeled by considering lin- 
guistic values of the domains of attributes as terms of linguistic variables (see 
chapter 9), or one can assign to the relations an additional degree of membership. 
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In this case the table “Reliability” in the last example would, for instance, look 
as follows: 


Reliability 


supplier | material | reliability 









Dewag  |so2o2s high |8 
mam [somons | medium | 6 
Ksa | 802025 tow e 
high EE 








In this case the “values” for the attribute reliability would obviously be consid- 
ered as being crisply defined (as symbols) and the ug would indicate the degree 
to which the relation is true. There might be another table which shows the 
degrees of membership for other “reliabilities” (high, low or medium) of the 
suppliers. 

Fuzzy data bases are still very seldom in practice. One of the reasons may be 
that companies are very hesitant to replace their (crisp) data based by fuzzy data 
banks before they are convinced that it is worthwhile or necessary to do this. 

Another application of fuzzy set theory is to design fuzzy query languages to 
crisp data bases. This might avoid replacing existing crisp data banks and still 
taking advantages of the strength of fuzzy set theory. 


12.3 Fuzzy Queries in Crisp Databases 


With respect to databases fuzzy sets can primarily be used in two directions: first 
to differentiate between different degrees of relevance, strength of relations etc. 
Secondly, they can also be used to reduce complexity, i.e. to extract from large 
masses of data relevant information. The first goal was considered in the last 
section. Now we want to focus on the second goal. 

In the last section of this chapter we called all the values that an attribute could 
have the domain of this attribute. From a user’s point of view not all values in 
the domain of an attribute will have to be considered different. Values may be 
distinguishable, i.e. 4 and 5, but the user might consider them as indifferent in 
the context of a certain query. 
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We shall call elements in the domain of an attribute that have, in a certain 
context, the same meaning “equivalent”. This can be expressed in the form of an 
equivalence relation. 


Example 12-2 
The domain of the attribute “quantity” be defined as 
D, = thigh, medium, sufficient, low}. 


For the purpose of a certain query the user is only interested whether the quality 
is either “high or medium” or “sufficient or low”. 
This can be expressed by the following equivalence relation, E: 








E medium | sufficient | low 
men oo o oo o d o 
mediom ft 
socie f ooo ooo e 
pow 


1 
l 





— 





Hence, the domain of the attribute “quality” in this context is partitioned into two 
subsets of equivalent values which we will call “equivalence classes”. 
Expressed differently: 


C(quality) = {{high, medium}, {sufficient, low}}. 


In the context C the equivalence relation has partitioned the domain of the 
attribute quality into two equivalence classes. 

The introduction of equivalence classes obviously reduces the complexity of 
the data to be considered by reducing the number of component of vector D, to 
those of vector C. 


Example 12-3 [Schindler 1997] 


Let us consider the following data base which describes suppliers delivering 
materials with different qualities and different delivery delays: 
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supl. | supplier | material | quality 
802.025 
DEWAG | 802.025 










em (8 
DEWAG | 809.200 | high |8 
KBA 
KBA | 840.024 [iow J9 
MD (802025 | sufficient |8 
high 
MAM__ | 840.024 | medium |6 | 
zr  |sos200 | high (e 


The domains of “quality” and “delay” are 





D, = {high, medium, sufficient, low}. 
Da = [1, 10]. 
The goal of a query is to evaluate the suppliers in 4 groups, such that appropri- 
ate measures can be taken to improve the supply situation. 
The manager of the purchasing department believes that for the query the fol- 
lowing contexts are appropriate: 
C, (quality) = {{high, medium}, {sufficient, low}}. 


{high, medium} is considered good quality and {sufficient, low} indicates bad 
quality. 


Cy(delay) = {[1, 5], [5, 10]f, 


where a delay of [1, 5] is considered acceptable and (5, 10] is considered 
unacceptable. 
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Applying these contexts the our data base we obtain the following classes: 







supl. | supplier | quality | delay __ 
(sufficient, low) 


An interpretation of these 4 classes is shown in the following matrix: 


























unacceptable C2 C4 
ask supplier to | terminate 
decrease delays | relationship 

















C3 
ask supplier to 
improve quality 


Cl 
expand 
relationship 


acceptable 








| sood | a 





D (quality) 
quality 


Cl 
C2 
C3 
C4 


Obviously suppliers in one class are not distinguishable according to their attrac- 
tiveness. This might be demotivating for suppliers when they improve quality or 
delay and still remain in the same class. One way to improve this situation is to 


define fuzzy sets over the attributes “quality” and “delay”. 


Let us define the following two linguistic variables: 


The linguistic variable “delay” shall have two terms “acceptable” and “unac- 


ceptable” with the following membership functions: 
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1 for 1su<3 
Uac. (U) =4(7-u)/(7-3) for 3<u<7 
0 for 72u 
0 for u<3 
Uunace, (U) = 4 (u — 3)/(7-3) for 3<u<7 
1 for 7su 


For the linguistic variable “delay” we shall define the two terms “good” and “bad” 
with the following membership functions: 


W zooa (4) = {(high, 1), (medium, .67), (sufficient, .33)} 
Uaa (u) = {(medium, .33), (sufficient, .67), (low, 1)}. 


Graphically the class matrix would now book as follows: 


D (delay) 


ofe 


ask supplier to 
decrease delays 








C4 
terminate 
relationship 


unacceptable 





















C3 
ask supplier to 
improve quality 


acceptable 


D (quality) 
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For the various suppliers the degrees of membership in the different terms can 
easily be determined by substituting the values of the attributes in the member- 
ship functions. 

The supplier “BAW” in the data base, for instance, would provide material 
which is of good quality to the degree .33 and of bad quality to the degree .67. 
His delay is acceptable to the degree 0 and unacceptable to the degree 1. 

We might, however, also be interested in either the degree of membership to 
which either a supplier belongs to the various classes or the degree to which he 
is “attractive”, where “attractive” can be considered as “having good quality and 
an acceptably delivery delay”. 

In their case we have to aggregate the respective degrees of membership. Since 
it is an “and” aggregation, we could either use a t-norm or a compensatory aggre- 
gation. We shall assume that the two attributes are compensatory and, therefore, 
choose the “compensatory and” (definition 3—20). We shall compute the degree 
of membership of the suppliers in the different classes and use y= .5. 

For supplier “BAW” the degrees of membership for classes 1 and 3 are obvi- 
ously 0. 

For class 2 the terms “unacceptable” (of delay) and “good” (of quality) 
are relevant. For “BAW” these are 1 (delay of 8) and .33 (sufficient quality), 
respectively. 

Hence, using the y-operator with y= .5: 


U.0(BAW) =(1-.33)° (1 — (1 —.33)(1 —0))” 
= .57-1=.57 


For class 4 we would obtain accordingly U.4(BAW) = .82. 


Obviously these two degrees of membership do not add up to 1. If we want to 
obtain normalized degrees of membership, we can divide all degrees of mem- 
bership for “BAW” by the sum of degrees of membership of “BAW” to the 4 
Classes (this is the cardinality according to definition 2—5). 

For “BAW” the cardinality is (.57 + .82) = 1.39 and hence, we obtain the class 
memberships of 


Hew (1) =0 
Heaw(2)=.41 
Leaw(3) =0 
Leaw (4) =.59 


The remaining suppliers supply more than one material. 
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Here we have two alternative ways of proceeding, depending on whether we 
are interested in a specific material delivered by several suppliers or whether we 
want to evaluate suppliers with respect to all materials they supply. We will 
assume the latter. In this case we compute the degrees of membership for all mate- 
rials, suppliers and classes separately and then add the degrees of membership of 
different materials of one supplier for each class. 

“MTX”, for example, supplies two materials (802.025 and 840.024) with dif- 
ferent ratings. Let us consider class 1: 

Material “802.025” has degrees of membership of 1 and 1 respectively. Mate- 
rial “840.025” has 1 and 0.75. Hence, the first material has an (unnormalized) 
degree of membership of 1 and “840.024” one of .87. 

Following Ozawa and Yamada [1994] we add these two degrees of member- 
ship to determine the degree of membership of MTX to class 1. After we have 
determined the (unnormalized) degrees of membership of MTX to the other 
classes we will find that the cardinality for MTX is 2.37. Hence: MTX belongs 
to class 1 to the degree 


1.87 


2.37 


The following table shows the unnormalized degrees of membership of the sup- 
pliers to the classes. The last row of this matrix shows the respective cardinali- 
ties. If these are used for normalization, we arrive at the subsequent matrix of 
normalized degrees of membership. 


[a a 
fa [as [is [es [oso fos i 
fe om [oss [ore [ous | [oar [os 
afee [oss [ise [mom [as [oo 
[ow be ba bala [ae [ow 


Partition matrix of non-normalized degrees of membership 






expand 
relationship 








ask supplier 
to decrease 
delays ask 
supplier to 
improve 
quality 





terminate 
relationship 
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expand 
relationship 


ask supplier 
to decrease 
delays ask 
supplier to 
improve 
quality 
terminate 
relationship 





Partition matrix of normalized degrees of membership 


In above example the aggregation of the degrees of membership was performed 
by using the y-operator with y = 0.5. As was already described in chapter 3, this 
models an aggregation in the middle of the “logical and” and the “‘liguistic or’. 
More or less compensation can be achieved by varying the y between zero and 
one. It might also be appropriate to assign different weights (importance) to the 
various attributes. This is also possible when using the y-operator, requires some 
caution, however (see [Zimmermann and Zysno 1983]). 


1 3 FUZZY DATA ANALYSIS 


13.1 Introduction 


The terms data analysis, pattern recognition, and data mining are often used 
synonymously, and we shall do the same here. On the one hand, this area is 
one of the oldest and most obvious application areas for fuzzy set theory. On 
the other hand, pattern recognition existed long before fuzzy sets became 
known. 

This topic embraces a very large and diversified literature. It includes research 
in the areas of artificial intelligence, interactive graphic computers, computer 
aided design, psychological and biological pattern recognition, linguistic and 
structural pattern recognition, and a variety of other research topics. One could 
possibly distinguish between mathematical pattern recognition (primarily cluster 
analysis) and nonmathematical pattern recognition. One of the major differences 
between these two areas is that the latter is far more context dependent than the 
former: a heuristic computer program that is able to select features of chromo- 
somal abnormalities according to a physician’s experience will have little use for 
the selection of wheat fields from a photo-interpretation viewpoint. In contrast 
to this example, a well-designed cluster algorithm will be applicable to a large 
variety of problems from many different areas. The problems will again be 
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different for structural pattern recognition—when, for instance, handwritten H’s 
should be distinguished from handwritten A’s, and so on. 

Verhagen [1975] presents a survey of definitions of pattern recognition 
that also cites the difficulties of any attempt to define this area properly. Bezdek 
[1981, p. 1] defines pattern recognition simply as “A search for structure 
in data.” 

The most effective search procedure—in those instances in which it is 
applicable—is still the “eyeball” technique applied by human “searchers.” Their 
limitations, however, are strong in some directions: Whenever the dimensional- 
ity of the volume of data exceeds a limit, and the human senses, especially the 
vision, are not able to recognize data or features, the “eyeball” technique cannot 
be applied. 

One of the advantages of human search techniques is the ability to recognize 
and classify patterns in a nondichotomous way. One way to imitate this strength 
is the development of statistical methods in mathematical pattern recognition, 
which in connection with high-speed computers have shown very impressive 
results. There are data structures, however, that are not probabilistic in nature or 
not even approximately stochastic. Given the power of existing EDP, it seems 
very appropriate and promising to find nonprobabilistic, nondichotomous models 
and structures that enable us to recognize and transmit in a usable form patterns 
of this type, which humans cannot find without the help of more powerful 
methods than “eyeball-search.” Here, obviously, fuzzy set theory offers some 
promise. Fuzzy set theory has already been successfully applied in different areas 
of pattern search and at different stages of the search process. In the references, 
we cite cases of linguistic pattern search, of character recognition [Chatterji 
1982], of visual scene description [Jain and Nagel 1977], and of texture classifi- 
cation [Hajnal and Koczy 1982]. We also give references for the application 
of fuzzy pattern recognition to medical diagnosis [Fordon and Bezdek 1979; 
Sanchez et al. 1982], to earthquake engineering [Fu et al. 1982], and to pattern 
search in demand [Carlucci and Donati 1977]. 

Another way to describe the main goal of data analysis is complexity reduc- 
tion, in the sense that data masses that cannot be comprehended by human beings 
are reduced to lower-dimensional information that can be used, for instance, by 
human decision makers to support their decisions. 

In data analysis, objects are considered that are described by some attributes. 
Objects can, for example, be persons, things (machines, products, ...), time 
series, sensor signals, process states, and so on. The specific values of the attrib- 
utes are the data to be analyzed. The overall goal is to find structure (informa- 
tion) about these data. This can be achieved by classifying the huge amount of 
data into relatively few classes of similar objects. This leads to a complexity 
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reduction in the considered application, which allows for improved decisions 
based on the information gained. 

The process of data analysis normally starts with the description of the 
process or the set of data that is to be analyzed. This process is very nontrivial, 
often least supported by tools, and generally leads to a high-dimensional 
model (one dimension corresponding to one property of the data or process). 
In feature analysis, the first reduction of complexity (dimension) is reached by 
reducing the number of properties to those that are most important, i.e., that 
contribute most to the description of the process or data set. Since this reduction 
is generally not yet sufficient, an additional reduction is achieved by defining 
in feature space a small number of classes. This stage is called classifier design, 
and it more or less terminates the preparatory steps of data analysis. These classes 
are now used, either in a batch type operation or continuously, to assign single 
objects or data to classes and thus to extract manageable information for human 
operators or subsequent systems figure 13-1 shows the interdependent steps of 
data analysis as described above. 

The methods mentioned in the boxes in figure 13-1 indicate that numerous 
“classical” methods are already available. The process of data analysis described 
so far is not necessarily connected with fuzzy concepts. 

If, however, either features or classes are fuzzy, the use of fuzzy approaches 
is desirable. In figure 13-1, for example, objects, features, and classes are con- 
sidered. Both features and classes can be represented in crisp or fuzzy terms. An 
object is said to be fuzzy if at least one of its features is fuzzy. This leads to the 
following four cases: 


e crisp objects and crisp classes 

e crisp objects and fuzzy classes 
e fuzzy objects and crisp classes 
e fuzzy objects and fuzzy classes 


Obviously, the first case is the domain of classical pattern recognition, while the 
latter three cases are the subject of fuzzy data analysis. 


13.2 Methods for Fuzzy Data Analysis 


Figure 13—1 indicates that some boxes—particularly those of feature analysis and 
classifier design—contain quite a number of classical dichotomous methods, such 
as Clustering, regression analysis, etc., which for fuzzy data analysis have been 
fuzzified, 1.e., modified to suit problem structures with fuzzy elements. The box 
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determination of membership function 
feature nomination 
scale levels 














Classifier Design 


clustering 
structured modelling 
neural nets 
knowledge based 
approaches 


Feature Analysis 


—— 






factor analysis 
discriminant analysis 
regression analysis 


Classification 
pattern recognition 
diagnosis 

ling. approximation 
fuzzification 
defuzzification 
ranking 


neural nets 


Figure 13-1. Scope of data analysis. 


“classification,” in contrast, lists some approaches that originate in fuzzy set 
theory and that did not exist before. 

In modern fuzzy data analysis, three types of approaches can be distinguished. 
The first class is algorithmic approaches, which in general are fuzzified versions 
of classical methods, such as fuzzy clustering, fuzzy regression, etc. The second 
class is knowledge-based approaches, which are similar to fuzzy control or fuzzy 
expert systems. The third class, (fuzzy) neural net approaches, is growing rapidly 
in number and power. Increasingly combined with these approaches, but not 
discussed in this book, are evolutionary algorithms and genetic algorithms (see 
Zimmermann [1994]). 

The major three classes mentioned above will be discussed in the following 
sections of this chapter. 
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13.2.1 Algorithmic Approaches 


For feature analysis, fuzzy regression methods have been used. Recommended 
publications concerning this approach (which will not be discussed in this book) 
are, e.g., Bardossey et al. [1992, 1993], Diamond [1993], Ishibuchi [1992], 
Kacprzyk [1992], Peters [1994], and Tanaka [1987]. 

Here we shall focus our attention on clustering methods. 


13.2.1.1 Fuzzy Clustering 


13.2.1.1.1 Clustering Methods. Let us assume that the important problem of 
feature extraction—that is, the determination of the characteristics of the physi- 
cal process, the image of other phenomena that are significant indicators of struc- 
tural organization, and how to obtain these—has been solved. Our task is then to 
divide n objects x € X characterized by p indicators into c, 2 < c < n, categori- 
cally homogenous subsets called “clusters.” The objects belonging to any one of 
the clusters should be similar and the objects of different clusters as dissimilar as 
possible. The number of clusters, c, is normally not known in advance. 

The most important question to be answered before applying any clustering 
procedure is which mathematical properties of the data set (for example, distance, 
connectivity, intensity, and so on) should be used and in what way they should 
be used in order to identify clusters. This question will have to be answered for 
each specific data set, since there are no universally optimal cluster criteria. 
Figure 13-2 shows a few possible shapes of clusters; and it should be immedi- 
ately obvious that a cluster criterion that works in figure 13—2a will show a very 
bad performance in figures 13—2b or 13—2c. More examples can, for instance, be 
found in Bezdek [1981] or Roubens [1978] and in many other publications on 
cluster analysis and pattern recognition [Ismail 1988, p. 446; Gu and Dubuisson 
1990, p. 213]. 

For further illustration of this point, let us look at an example from Bezdek 
[1981, p. 45]. Figure 13-3 shows two data sets, which have been clustered by a 
distance-based objective function algorithm (the within-group sum-of-spared- 
error criterion) and by applying a distance-based graph-theoretic method (single- 
linkage algorithm). Obviously, the criterion that leads to good results in one case 
performs very badly in the other case and vice versa. (Crisp) clustering methods 
are commonly categorized according to the type of clustering criterion used in 
hierarchical, graph-theoretic, and objective-functional methods. 

Hierarchical clustering methods generate a hierarchy of partitions by means 
of a successive merging (agglomerative) or splitting (diverse) of clusters 
[Dimitrescu 1988, p. 145]. Such a hierarchy can easily be represented by a den- 
dogram, which might be used to estimate an appropriate number of clusters, c, 
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Figure 13-2. Possible data structures in the plane. 


for other clustering methods. On each level of agglomeration or splitting, a locally 
optimal strategy can be used without taking into consideration the policies used 
on preceding levels. These methods are not iterative; they cannot change the 
assignment of objects to clusters made on preceding levels. Figure 13-4 
shows a dendogram that could be the result of a hierarchical clustering 
algorithm. The main advantage of these methods is their conceptual and compu- 
tational simplicity. 

In fuzzy set theory, this type of clustering method would correspond to the 
determination of “similarity trees” such as those shown in example 6-14. 
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Figure 13-3. Performance of cluster criteria. 


Figure 13-4. Dendogram for hierarchical clusters. 
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Graph-theoretic clustering methods are normally based on some kind of 
connectivity of the nodes of a graph representing the data set. The clustering 
strategy is often breaking edges in a minimal spanning tree to form subgraphs. 
If the graph representing the data structure is a fuzzy graph such as those dis- 
cussed in chapter 6, then different notions of connectivity lead to different types 
of clusters, which in turn can be represented as dendograms. Yeh and Bang 
[1975], for instance, define four different kinds of clusters. For the purpose of 
illustrating this approach, we shall consider one of the types of clusters suggested 
there. 


Definition 13-1 [Yeh and Bang 1975] 


Let G = [V, R] be a symmetric fuzzy graph. Then the degree of a vertex V is 
defined as d(v) = L,4o4@(u). The minimum degree of G is &(G) = miny {d(v)}. 

LetG =[V, R] be a symmetric fuzzy graph. G is said to be connected if, for 
each pair of vertices u and v in V, gu, v) > 0. G is called t-degree connected 
for some T > 0 if 5(G) 2 t andG is connected. 


Definition 13-2 


LetG =[V, R] bea symmetric fuzzy graph. Clusters are then defined as maximal 
t-degree connected subgraphs of G. 


Example 13-1 [Yeh and Bang 1975, p. 145] 


Let G be the symmetric fuzzy graph shown in figure 13-5. The dendogram in 
figure 13-6 shows all clusters for different levels of t. For further details, see Yeh 
and Bang [1975]. 

Objective-function methods allow the most precise formulation of the cluster- 
ing criterion. The “desirability” of clustering candidates is measured for each c, 
the number of clusters, by an objective function. Typically, local extrema of the 
objective function are defined as optimal clusterings. Many different objective 
functions have been suggested for clustering (crisp clustering as well as fuzzy 
clustering). The interested reader is referred in particular to the excellent book 
by Bezdek [1981] for more details and many references. We shall limit our con- 
siderations to one frequently used type of (fuzzy) clustering method, the so-called 
c-means algorithm. 

Classical (crisp) clustering algorithms generate partitions such that each object 
is assigned to exactly one cluster. Often, however, objects cannot adequately be 
assigned to strictly one cluster (because they are located “between” clusters). In 
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Figure 13-5. Fuzzy graph. 
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Figure 13-6. Dendogram for graph-theoretic clusters. 


these cases, fuzzy clustering methods provide a much more adequate tool for 
representing real-data structures. 

To illustrate the difference between the results of crisp and fuzzy clustering 
methods let us look at one example used in the clustering literature very 
extensively: the butterfly. 
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Figure 13-7. The butterfly. 


Example 13-2 


The data set X consists of 15 points in the plane, as depicted in figure 13-7. Clus- 
tering these points by a crisp objective-function algorithm might yield the picture 
shown in figure 13-8, in which “1” indicates membership of the point in the left- 
hand cluster and “0” membership in the right-hand cluster. The x’s indicate the 
centers of the clusters. Figures 13-9 and 13-10, respectively, show the degrees 
of membership the points might have to the two clusters when using a fuzzy 
clustering algorithm. 

We observe that, even though the butterfly is symmetric, the clusters in figure 
13-8 are not symmetric because point xg, the point “between” the clusters, has 
to be (fully) assigned to either cluster 1 or cluster 2. In figures 13-9 and 13-10, 
this point has the degree of membership .5 in both clusters, which seems to be 
more appropriate. Details of the methods used to arrive at figures 13-8 to 13—10 
can be found in Bezdek [1981, p. 52] or Ruspini [1973]. 


Let us now consider the clustering methods themselves. 

Let the data set X = {x,,...,x,} c R be a subset of the real p-dimensional 
vector space R’. Each x, = (%,,..-, Xk) E R’ is called a feature vector. Xx is the 
jth feature of observation x. 

Since the elements of a cluster shall be as similar to each other as possible and 
the clusters as dissimilar as possible, the clustering process is controlled by use 
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Figure 13-8. Crisp clusters of the butterfly. 
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Figure 13-9. Cluster 1 of the butterfly. 
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Clustercenters 2 X 





Figure 13-10. Cluster 2 of the butterfly. 


of similarity measures. One normally defines the “dissimilarity” or “distance” of 
two objects x, and x as a real-valued function d: X x X — R* that satisfies 
d(x, X))=dy 20 
dis =0 6 Xg = Xi 
du = Ay 
If additionally d satisfies the triangle equality, that is, 
dy < diy + di 


then d is a metric, a property that is not always required. If each feature vector 
is considered as a point in the p-dimensional space, then the dissimilarity dą of 
two points x, and x; can be interpreted as the distance between these points. 

Each partition of the set X = {x,,..., Xa} into crisp or fuzzy subsets S, (i = 1, 
..., C) can fully be described by an indicator function us, or a membership func- 
tion uş, respectively. In order to stay in line with the terminology of the preced- 
ing chapters, we shall use, for crisp clustering methods, 


Us.. X —> 10, 1} 
and, for fuzzy cases, 
Hg: X > {0, 1} 


where u; and u; denote the degree of membership of object x, in the subset S ,, that is, 
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Uik: = Us (Xx) 
Mig: = Us (%) 
Definition 13-3 
Let X = {x,,...,x,} be any finite set. V,,, is the set of all real c x n matrices, and 


2 <c<nis an integer. The matrix U = [uy] E€ Van is called a crisp c-partition if 
it satisfies the following conditions: 


l. ux E {0, 1} I<i<c,l<k<n 


2. Ye =1 lsk<r 
i=l 


3. o<> <n 1<i<c 


k=] 


The set of all matrices that satisfy these conditions is called M.. 


Example 13-3 


Let X = {x), x2, x3}. Then there are the following three crisp 2-partitions: 


Xi X2 X3 

f l "| 
U, = 

0 0 1 

Xi X2 X3 

Í 0 
U» = 

O 1 1 

Xi X2 X3 


f 0 i 
U, = 
0 1 0 


Obviously, conditions (2) and (3) of the definition rule out the 
following partitions: 


Xi X2 X3 
Í l | 
O 1 1 
Xi X2 X3 


F 
0 0 0 
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Definition 13-4 


X, Van, and c are as in definition 13.3. U = [ua] € Ven is called a fuzzy-c partition 
if it satisfies the following conditions [Bezdek 1981, p. 26]: 


l. ue [0,1] 1sisc,l<k<n 


2. Spe =l 1<k<r 
i=} 


n 
3. 0< Yb <n l<i<c 
k=1 

Msgr will denote the set of all matrices satisfying the above conditions. By con- 
trast to the crisp c-partition, elements can now belong to several clusters and to 
different degrees. Conditions (2) and (3) just require that the “total membership” 
of an element is normalized to 1 and that the element cannot belong to more 
Clusters than exist. 


Example 13-4 


Let X = {X, x2, x3}. Then there exist infinitely many possible fuzzy 2-partitions, 
such as 


X X2 X3 
~ 5 "| 
Uu = 

0 5 1 

Xi X2 X3 
~ $ 5 r 
U = 

2 5 8 

NX, X2 X3 


and so on. 
Our butterfly example (figure 13-7), for instance, could have the following 
partition: 


Xi X B AU X5 X OX X X% Xo XA X2 XÆ Xy Xis 
Ū=4.86 .97 .86 .94 .99 .94 86 .5 .14 .06 01 .06 .14 .03 .14 
14 .03 .14 .06 Ol .06 .14 .5 .86 .94 .99 .94 .86 .97 .86 


FUZZY DATA ANALYSIS 291 


The location of a cluster is represented by its “cluster center” v; = (Vj, . . . , Vip) 
e R’,i=1,...,.c, around which its objects are concentrated. 
Let v =(v;,..., VA € R? be the vector of all cluster centers, where the v; in 


general do not correspond to elements of X. 

One of the frequently used criteria to improve an initial partition is the so- 
called variance criterion. This criterion measures the dissimilarity between the 
points in a cluster and its cluster center by the Euclidean distance. This distance, 
di, is then [Bezdek 1981, p. 54]. 


diz = d(x, vi) 


=||x, -vill 


p 1/2 
2 
= $ (Xy — Vi | 
j=! 


The variance criterion for crisp partitions corresponds to minimizing the sum 
of the variances of all variables j in each cluster i, with |S;| = n, and yields 


c p 
min, Y Gy -y © 


i=] j=] xkESi 


min= >. 5 S (xy — vy). 


i=l x,€5; j=l 


As indicated by the above transformation, the variance criterion corresponds— 
except for the factor 1/n—to minimizing the sum of the squared Euclidean 
distances. The criterion itself amounts to solving the following problem: 


c 

. 2 
min 2(S,,...,S3v)= X, > lx — vill 

i=l xpeS; 

such that 
1 
V; = — Xk 
IS; 2 


Using definition 13-3, the variance criterion for crisp c-partitions can be 
written as 


C n 
. ~ 2 
min z(U, v) = > , 5 Uill — vill 


i=] k=] 


such that 
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V; = l > Win): 


n 


k=l 
Uik 
k=1 


For fuzzy c-partitions according to definition 13—4, the variance criterion 
amounts to solving the following problem: 


min z(U, v) = ©, > ie)” Ie -vil 


i=] k=! 


such that 


1 < m 
vi =—— > (ux), m>] 


k=l 
Uik 
k=l 


Here v; is the mean of the x, m-weighted by their degrees of membership. That 
means that the x, with high degrees of membership have a higher influence on v; 
than those with low degrees of membership. This tendency is strengthened by m, 
the importance of which we will discuss in more detail at a later time. It was 
shown (see, for instance, Bock [1979a, p. 144]) that, given a partition U , Vi 18 
best represented by the clusters S; as described above. 

If we generalize the criterion concerning the used norm, the crisp clustering 
problem can be stated as follows: Let G be a (p x p) matrix, which is symmetric 
and positive-definite. Then we can define a general norm 


2 T 
lx —villc = (x, —V;) Gly -v;) 


The possible influence of the chosen norm, determined by the choice of G, 
will be discussed later. This yields the formulation of the problem: 


n c 
, 2 
min z(U, v) = ` X un lx — vile 
k=1 i=] 
such that 
Ue M. 


ve R? 


This is a combinatorial optimization problem that is hard to solve, even for 
rather small values of c and n. In fact, the number of distinct ways to partition x 
into nonempty subsets is 
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mafio] 


which for c = 10 and n = 25 is already roughly 10'* distinct 10-partitions of the 
25 points [Bezdek 1981, p. 29]. 
The basic definition of the fuzzy partitioning problem for m > 1 is 


min zn (Ü; v) = X ¥ ie)” lx — ville (Pn) 


k=1 i=l 
such that 
Ue Mr 
ve R? 


(Pm) is an analytical problem, which has the advantage that by using differential 
calculus one can determine necessary conditions for local optima. Differentiat- 
ing the objective function with respect to v; (for fixed U) and to ux (for fixed v) 


and applying the condition X Ha = 1, one obtains (see [Bezdek 1981, p. 67]): 


i=] 


1 ow m 
Vi =——— È (Ha) x i=1,...,€ (13.1) 
Y Win)” i 
k=] 


( l o 
~’ 
-vV; , 
lx — Ville =1,...,c:k=1,...,n (13.2) 


1/(m-1) ol 
S| | 
2 
j=1 \ |x, —Vvill 


Let us now comment on the role and importance of m: It is called the expo- 
nential weight, and it reduces the influence of “noise” when computing the cluster 
centers in equation (13.1) (see Windham [1982, p. 358]) and the value of the 
objective function z,, (U; v). m reduces the influence of small u (points further 
away from v,) compared to that of large U (points close to v,). The larger m > 
1, the stronger is this influence. 

The systems described by equations (13.1) and (13.2) cannot be solved ana- 
lytically. There exist, however, iterative algorithms (nonhierarchical) that approx- 
imate the minimum of the objective function, starting from a given position. One 
of the best-known algorithms for the crisp clustering problem is the (hard) c- 
means algorithm or (basic) ISODATA-algorithm. Similarly, the fuzzy clustering 


ik T 
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problem can be solved by using the fuzzy c-means algorithm, which shall be 
described in more detail in the following. 

The fuzzy c-means algorithm [Bezdek 1981, p. 69]. For each m e (0, œ), a 
fuzzy c-means algorithm can be designed that iteratively solves the necessary 
conditions (13.1) and (13.2) above and converges to a local optimum (for proofs 
of convergence, see Bezdek [1981] and Bock [1979]). 

The algorithm comprises the following steps: 


Step 1. Choose c (2 < c < n), m (1 < m < œ), and the (p, p)-matrix G with 
G symmetric and positive-definite. Initialize U® € Mp, set 1 = 0. 

Step 2. Calculate the c fuzzy cluster centers {v®} by using Ọ® from condition 
(13.1). i 

Step 3. Calculate the new membership matrix U“” by using {v®} from 
condition (13.2) if x, # v®. Else set 


E n for j =i 
“| for j#i 
Step 4. Choose a suitable matrix norm and calculate A = ||U“? — U® lle. If 


A > £, set l= l + 1 and go to step 2. If A < £, > stop. 
For the fuzzy c-means algorithm, a number of parameters have to be chosen: 


the number of clusters c, 2 < c < n; 

the exponential weight m, 1 < m < œ; 

the (p, p) matrix G (G symmetric and positive-definite), which induces a norm; 
the method to initialize the membership matrix Ü®; 

the termination criteria A = |U“? -Ülle < £. 


Example 13-5 [Bezdek 1981, p. 74] 


The data of the butterfly shown in figure 13-7 were processed with a fuzzy 
2-means algorithm, using as a starting partition 


go =[ 146 854... ‘sa 
146 854 146 ... 146s 


€ was chosen to be .01; the Euclidean norm was used for G; and m was set to 
1.25. Termination in six iterations resulted in the memberships and cluster centers 
shown in figure 13-11. For m = 2, the resulting clusters are shown in figure 13-12. 


As for other iterative algorithms for improving starting partitions, the number c 
has to be chosen suitably. If there does not exist any information about a good c, 
the computations are carried out for several values of c. In a second step, the best 
of these partitions is selected. 
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Figure 13-11. Clusters for m = 1.25. 
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Figure 13—12. Clusters for m = 2. 
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The exponential weight m influences the membership matrix. The larger the 
m, the fuzzier becomes the membership matrix of the final partition. For m > ©, 
U approaches U = [+]. This is, of course, a very undesirable solution, because 
each x, is assigned to each cluster with the same degree of membership. 

Basically, less fuzzy membership matrices are preferable because higher 
degrees of membership indicate a higher concentration of the points around the 
respective cluster centers. No theoretically justified rule for choosing m exists. 
Usually m = 2 is chosen. 

G determines the shape of the cluster, which can be identified by the fuzzy 
c-means algorithm. If one chooses the Euclidean norm Nz, then G is the identity 
matrix J, and the shape of the clusters is assumed to be an equally sized hyper- 
sphere. Other frequently used norms are the diagonal norm or the Mahalanobis 
norm for which Gp = [diag(o7)]}' and Gy = [cov (x)]', respectively, where o? 
denotes the variance of feature j. 

The final partition depends on the initially chosen starting position. When 
choosing an appropriate c, if there exists a good clustering structure, the final par- 
titions generated by a fuzzy c-means algorithm are rather stable. 

A number of variations of the above algorithm are described in Bezdek [1981]. 
The interested reader is referred to this reference for further details. Numerical 
results for a number of algorithms are also presented in Roubens [1978]. 


13.2.1.1.2 Cluster Validity 


Complex algorithms stand squarely between the data for which substructure is hy- 
pothesized and the solutions they generate; hence it is all but impossible to transfer a 
theoretical null hypothesis about X to U € Mp, which can be used to statistically sub- 
stantiate or repudiate the validity of algorithmically suggested clusters. As a result a 
number of scalar measures of partition fuzziness (which are interesting in their own 
right) have been used as heuristic validity indicants [Bezdek 1981, p. 95]. 


Actually, the so-called cluster validity problem concerns the quality or the 
degree to which the final partition of a cluster algorithm approximates the real or 
hypothesized structure of a set of data. Most often this question is reduced, 
however, to the search for a “correct” c. Cluster validity is also relevant 
when deciding which of a number of starting partitions should be selected for 
improvement. 

For measuring cluster validity in fuzzy clustering, some criteria from crisp 
cluster analysis have been adapted to fuzzy clustering. In particular, the so- 
called validity functionals used express the quality of a solution by measuring its 
degree of fuzziness. While criteria for cluster validity are closely related to the 
mathematical formulation of the problem, criteria to judge the real “appro- 
priateness” of a final partition consider primarily real rather than mathematical 
features. 
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Let us first consider some criteria taken from traditional crisp clustering. 

One of the most straightforward criteria is the value of the objective function. 
Since it decreases monotonically with increasing number of clusters, c, that is, it 
reaches its minimum for c = n, one chooses the c* for which a large decrease is 
obtained when going from c* to c* + 1. Another criterion is the rate of conver- 
gence. This is justified because experience has shown that, for a good clustering 
structure and for an appropriate c, a high rate of convergence can generally be 
obtained. 

Because the “optimal” final portion depends on the initialization of the starting 
partition U°, the “stability” of the final partition with respect to different starting 
partitions can also be used as an indication of a “correct” number of clusters c. 

All three criteria serve to determine the “correct” number of clusters. They are 
heuristic in nature and therefore might lead to final partitions that do not correctly 
identify existing clusters. Bezdek shows, for instance, that the global minimum 
of the objective function is not necessarily reached for the correct partition 
[Bezdek 1981, pp. 96 ff]. Therefore other measures of cluster validity are needed 
in order to judge the quality of a partition. 

The following criteria calculate cluster validity functionals that assign to each 
fuzzy final partition a scalar that is supposed to indicate the quality of the clus- 
tering solution. When designing such criteria, one assumes that the clustering 
structure is better identified when more points concentrate around the cluster 
centers, that is, the crisper (unfuzzier) is the membership matrix of the final 
partition generated by the fuzzy c-means algorithm. 

The best-known measures for judging the fuzziness of a clustering solution are 


the partition coefficient, F ( U , ©), 
the partition entropy, H(U, c), and 
the proportion exponent, P(U, c). 


Definition 13~5 [Bezdek 1981, p. 100] 


Let Ue Mp be a fuzzy c-partition of n data points. The partition coefficient of 
U is the scalar 


FG,)= ee 


k=1 i=] 


Definition 13-6 [Bezdek 1981, p. 111] 


The partition entropy of any fuzzy c-partition Ue Ms of X, where | X | = n, is 
fr l<c<n 
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HU, c)= -= Yn log. (Wi) 


k=1 i=l 


(see definition 4—3a, b, where the entropy was already used as a measure of 
fuzziness.) 


Definition 13-7 [Windham 1981, p. 178; Bezdek 1981, p. 119] 


Let U € (MAMo) be a fuzzy c-partition of X; |X| =n; 2 < c < n. For column k 
of U,1<k<vn, let 


U: = maxi} 


1 
[uz] = greatest integer < (>) 


k 


The proportion exponent of U is the scalar 
~ nm | Bi j+1y, | \(e-1) 
PU, c) = -log [ [| SCD" OU- jue) 
k=1|_ j=l 


The above-mentioned measures have the following properties: 
1 ~ 
—<F(U,c)<1 
c 


0<A(U,c)< log.(c) 
O< PU, Cc) <œ 


The partition coefficient and the partition entropy are similar in so far as they 
attain their extrema for crisp partitions U € M.: 


FU, c)=1@ HU, c)=0@U EM, 
` 1 . ofi 
F(U,c)=— & HÜ, c)=log.(c) © Ü = B 
c c 


The (heuristic) rules for selecting the “correct” or best partitions are 


max {max{F(U, c)}} c=2,...,n—l1 


c UVeEeQ, 
min {min{H(U,c)}} c=2,...,n-1 
Cc UeQ, 


where Q. is the set of all “optimal” solutions for given c. 
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The limitations of F( U, c) and H(U , €) are mainly their monotonicity and the 
lack of any suitable benchmark that would allow a judgment as to the accept- 
ability of a final partition. The monotonicity will usually tend to indicate that the 
“correct” partition 1s the 2-partition. This problem can be solved, for instance, by 
choosing the i* partition for which the value of H( U, c) lies below the trend when 
going from c* — 1 to c*. 

H(U, c) is normally more sensitive with respect to a change of the partition 
than is F(U, c). This is particularly so if m is varied. 

While F(U, c) and H(U, c) depend on all c-n elements, the proportion expo- 
nent P(U, c) depends on the maximum degree of membership of the n elements. 
P(U, c) converges towards œ with increasing u, and it is not defined for p = 1. 

The heuristic for choosing a good partition is 


max {max{P(U,c)}} c=2,...,n-1 


c UVEQ, 


By contrast to F( U, c) and HU, c), P( Ü, c) has the advantage that it is not 
monotone in c. There exist, however, no benchmarks such that one can judge the 
quality of a portion c* from the value of P(U, c*). 

The heuristic for P(U, c) possibly leads to an “optimal” final partition 
other than the heuristics of F(U, c) and/or of H(U, c). This might necessitate the 
use of other decision aids derived from the data themselves or from other con- 
siderations. Bezdek [1981] describes quite a number of other approaches in his 
book. 

Even though the fuzzy c-means algorithm (FCM) performs better in practice 
than crisp clustering methods, problems may still have features that cannot be a- 
ccommodated by the FCM. Exemplarily, two of them shall be looked at briefly. 


Most crisp and fuzzy clustering algorithms seek in a set of data one or the other 
type of clustershape (prototype). The type of prototype used determines the dis- 
tance measurement criteria used in the objective function. Windham [1983] 
presented a general procedure that unifies and allows the construction of dif- 
ferent algorithms using points, lines, planes, etc. as prototypes. These algorithms, 
however, normally fail, if the pattern looked for is not in sense compact. For 
instance, the patterns shown figures 13—2b and 13—2c will hardly be found. Dave 
[1990] suggested an algorithm that can find rings or, in general, spherical shells 
in higher dimensions. His fuzzy shell clustering (FSC) algorithm modifies the 
variance criterion mentioned above (after example 13-4) by introducing the 
radius of the “ring” searched for, arriving at 


min z,(uv,7)= > (un)” (Da)? 


i=l k=1 
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Figure 13-13. Clusters by the FSC. (a) Data set; (b) circles found by FSC; (c) 
data set; (d) circles found by FSC. 


where 
Dy =| IIx, —vl|-7, | 


r; is the radius of the cluster prototype shell, and all other symbols are as defined 
for the FCM algorithm. The algorithm itself has to be adjusted accordingly by 
including 7;. 

Details are given in Dave [1990]. This algorithm also finds circles if the data 
are incomplete. Figure 13-13 shows examples of it from Dave. 

Interesting applications can be found in Dave and Fu [1994]. 

The FCM as well as the FSC satisfies the constraint 


Yue =l, 1<k<n 
i=] 


which was used in definition 13—4 of a fuzzy c-partition. Considering data sets 
shown in figure 13—14, this constraint would enforce that, for instance, two cluster 
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Figure 13-14. Data sets [Krishnapuram and Keller 1993]. 


points A and B would get the same degree of membership, u = .5, in clusters 
1 and 2. 

The u; would then express a kind of “relative membership” to the clusters, 
i.e., the membership of point B in cluster 1 compared to the membership of point 
B in cluster 2 (see also figure 13—14). From an observer’s point of view it might, 
however, be inappropriate to assign the same degrees of membership to points A 
and B because he interprets those as (absolute) degrees of membership, e.g., 
degrees to which points A or B belong to clusters 1 or 2, respectively. Krishna- 
puram and Keller [1993] suggest their possibilistic c-means algorithm (PCM) to 
compute the latter kind of degrees of membership for elements in clusters by 
modifying the definition of a fuzzy c-partition and, as a consequence, the objec- 
tive function of the cluster algorithm. 

Definition 13-4 is modified to 


1. ue [0,1], 1<i<c, 1<k<n 
2. 0< È usn, l1<i<c 

k=1 
3. maxu >0 for all k. 


Simply relaxing condition 2 in definition 13—4 in the FCM would produce the 
trivial solution, 1.e., the objective function would drive all degrees of member- 
ship to 0. This result is certainly not meaningful. One would rather try to have 
the degrees of membership of data that belong strongly to clusters appropriately 
high and those that do not represent the features of the clusters well very low. 
This is achieved by the following objective function: 


min 2(U, v) = > > in)” di +n > —Wn)”. 
i=l k=l i=l k=l 


Here dą can be the same distance as in the FCM, u are now the “absolute” 
degrees of memberships, and n; are appropriately chosen positive numbers (see 
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Krishnapuram and Keller [1993]). When applying such an algorithm to data sets 
as shown in figure 13—14, point A would obtain considerably higher degrees of 
membership than point B. 


13.2.2 Knowledge-Based Approaches 


Knowledge-based approaches resemble very much those procedures described in 
chapters 10 and 11. Figure 13-15 indicated the basic structure of knowledge- 
based classification. 

After the preprocessing, the data describing the elements are fed into an expert 
system. This contains in the knowledge base—in an appropriate fuzzy descrip- 
tion—the relevant features, which in the inference engine are aggregated per 
element. The results are either membership functions or possibly singletons. The 
“matching” function contains the description of the classes (fuzzy or crisp) and 
determines the similarity of the expert system output with the class description. 
An assignment of elements to classes occurs then either according to the respec- 
tive degrees of similarity or to the class with the highest degree of similarity. 

An example of such a data-mining system is described by Fei and Jawahir 
[1992]. The basic structure is given below. 

In a turning situation, the finish-turning operation involving the machining of 
a component at small feeds and at small depths of cut requires a number of major 
issues to be solved before the process can begin. The process of finish turning 
itself is so complex that it is practically impossible to establish any theoretical 
model that could precisely predict the machinability parameters. Here we shall 
only consider the relationship between depth of cut and feed on one hand and the 
resulting surface roughness on the other hand. 

Figure 13—16 shows the linguistic variables defining the relevant features on 
the input side. 

In this case the classes, i.e., surface roughness, are defined as intervals with 
linguistic labels as follows: 


zam | os | 611 | tas | 1520 | 2030 


The authors have modeled the uncertainty in this case by computing a kind of 
“uncertainty factor” that applies to the respective terms of the linguistic variable 
(classes). Alternatively, the classes could, of course, have been modeled by fuzzy 
sets, rather than by intervals, possibly in multidimensional space. 
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Figure 13-16. Linguistic variables “Depth of Cut” and “Feed.” 


Feed Fl F2 F3 F4 F5 F7 
Depth 
0.6/G 
Loa {°° Troy fioa fioa fioa (roa 
0.2/E 0.1/G | 0.7/F 0.6/A 
Deep (D3)| 1.0/E 1.0/G 1.0/A 1.0/P 
0.8/G 0.9/F | 0.3/A 0.4/P 
0.1/E 0.4/G | 0.6/F 0.2/A 
edium(D2)] 1.0/E 1.0/G 1.0/A 1.0/P 
0.9/G 0.6/F |0.4/A 0.8/P 
0.4/G 0.3/F 0.9/A 
Light (D1)]1.0/E |1.0/G | 1.0/G 1.0/F 1.0/A 
0.6/F 0.7/A 0.1/P 


E — Excellent, G — Good, F — Fair, A — Acceptable, P — Poor 












Work Material = AISI 1045 Chip Breaker = FCB4 
Cutting Speed = 230 m/min Tool Insert = TNMG 160408 


Figure 13-17. Knowledge base. 

The knowledge base of this system is shown in figure 13—17 and the structure 
of the entire system in figure 13—18. 
13.2.3 Neural Net Approaches 


Artificial neural nets (ANNs) have proven to be a very efficient and powerful tool 
for pattern recognition. The literature on types of ANNs and their applications to 
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Tool Insert Type 
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Knowledge-Base for Predicting Surface 
Roughness in Finish Turning 
Fuzzy Inference 
Engine 
Surface Roughness Prediction in Terms of 
Linguistic Variables with Certainty Levels 


Figure 13-18. Basic structure of the knowledge-based system. 


data analysis is abundant, and it would exceed the scope of this book to intro- 
duce the reader to this area. Since the beginning of the 1990s the relationship and 
the cross-fertilization of fuzzy set theory and artificial neural nets have grown 
stronger and stronger (see, for example, Lee [1975], Huntsberger [1990], Kosko 
[1992], Nauck et al. [1994], Kim and Choo [1994], and Kunchera [1994]). There 
are two reasons for this: (1) artificial neural nets are “classical” in the sense that 
originally their structure was dichotomous and a fuzzification has turned out to 
be useful in many cases, and (2) fuzzy set systems and ANNs are complemen- 
tary in the sense that fuzzy systems are interpretable, plausible, and in a sense 
transparent (knowledge-based) systems, which, however, in general cannot learn. 
In other words, the knowledge has to be acquired first and then fed into the 
systems in the form of if-then rules or otherwise. ANNs, by contrast, have the 
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“black box” character, i.e., they cannot be interpreted easily, but they can learn 
in a supervised or unsupervised fashion. 

It is obvious that it makes sense to combine the attractive features of these 
two approaches while trying to avoid their weaknesses. Unfortunately, it is also 
beyond the scope of this book to describe the various ways in which these two 
approaches have been combined. 


13.3 Dynamic Fuzzy Data Analysis 
13.3.1 Problem Description 


So far “objects” were considered to be elements or points (vectors) in the appro- 
priate spaces. 

The development of objects over time (and, therefore, the development of the 
features) is not considered explicitly or is taken into account by just using single 
values of the past in the feature vector. 

Methods that use this type of feature vectors can be called static. In many 
applications, however, explicit consideration of trajectories rather than single 
points is desirable, e.g.: 


e monitoring of patients in medicine, e.g. during narcosis, where the develop- 
ment of the patients’ condition is essential; 

e state-dependent machine maintenance; 

e rating of shares: the examination of the development of share prices and other 
characteristics allows better estimates than just considering the current share 
price. 


In all cases, where a dynamic viewpoint is desirable, the momentary snapshot for 
some components of the feature vector may be replaced by a trajectory of this 
feature. Thus, dynamic objects are represented by multi-dimensional trajectories 
in the feature space. Since most methods for data analysis are not suited to clas- 
sify objects described by trajectories, new methods for dynamic data analysis 
were developed. 

Figure 13—19 illustrates the difference between classical (static) and dynamic 
data analysis. Consider a two-dimensional feature space with one additional time 
dimension and suppose that a set of objects is observed over time. States of 
objects at a point of time can be seen in the cut of this three-dimensional space 
at the current moment (figure 13—19a). Two classes of objects can easily be dis- 
tinguished in this plane. However, if the trace of each object from the initial to 
its current state (i.e. its trajectory) is considered and projected into the feature 
space (figure 13—19b), other classes may seem more reasonable. 
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Figure 13-19. (a) States of objects at a point of time; (b) projections of trajec- 
tories over time into the feature space. 


One of the major problems is that “distances” or “similarities” used in cluster 
algorithms are defined with respect to pairs of points, but not with respect to pairs 
of functions (or vectors). | 


13.3.2 Similarity of Functions 


As stated before, the components of feature vectors describing dynamic objects 
are trajectories. Starting from the fact that most methods for data analysis use a 
distance measure or a similarity measure as a criterion to classify objects, one 
way to handle dynamic objects is to define the similarity measure for trajectories 
(functions) and to use it within existing or perhaps completely new methods. 

Similarity of trajectories can be defined in different ways. Basically, two view- 
points can be distinguished. 

The more similar are two trajectories 


e the better they match in form/evolution/characteristics(structural similarity) 
e the smaller their (pointwise) distance in feature space is (pointwise similar- 


ity). 


Figure 13-20 gives an example of the differences between structural and point- 
wise similarity. In terms of pointwise similarity A and B would be grouped 
together as well as C and D. But in terms of structural similarity the grouping 
{A,D} and {B,C} seems to be more natural (depending on the chosen type 
of structural similarity). The following two sections describe these two types of 
similarity and the relationships between them. 


Structural Similarity between Functions. Structural similarity relates to a 
variety of aspects of the trajectories (functions) under consideration: form, 
evolution, size or orientation (of trajectories in 2R")are some examples. Depend- 
ing on the chosen aspect, different criteria may be relevant to describe similar- 
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Figure 13-20. Structural and pointwise similarity. 


ity, e.g. slope, curvature, position and values of extremal points or other infor- 
mation like smoothness or monotonicity (as a degree of membership of a trajec- 
tory to the set of monotone functions). 

Here some examples of structural similarity are given for illustration: 

(A) Slope and curvature of trajectories are relevant, but their position in the 
feature space is not relevant: 

The functions y = x and y = 1.001*x + 100 are similar (both describe straight 
lines with approximately equal slope and a curvature of zero), whereas y = x and 
y = x + 0.001*sin(x) are not similar (despite the fact that they are much closer 
in terms of Euclidean distance). 

This type of definition of similarity can be applied to classify e.g. shares as 
“decreasing” (A, E, H), “increasing” (B, D, G, J, K), “constant” (F, D or “fluctuat- 
ing’ (C), depending on the trajectories of their share prices (figure 13-21). 

(B) Form of trajectories is relevant, but their size and position in the feature space 
are not relevant: 

The unit circle (center at (0, 0) and radius 1) and the circle with center at (100, 
0) and radius 17.4 are similar to degree 1, whereas the unit circle and the unit square 
are much less similar. 

This type of definition may be applied to classify engines, using the airborne 
sound they emit during operation: amplitude and position of characteristic pat- 
terns change with speed, independent of the state of the engine. However, the 
characteristic pattern remains the same depending on the fact, whether an engine 
is intact or damaged (figure 13—22). 
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Figure 13-21. Fictitious developments of share prices. 
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Figure 13-22. Idealized characteristic patterns of time signals for (a) an intact 
engine; (b) an engine with some defect. 
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In some cases, structural similarity can be reduced to pointwise similarity, for 
instance in the first example, by considering the pointwise similarity for the first 
and the second derivatives of the functions, respectively. 

One method to define structural similarity between functions is to consider 
relevant characteristics of these functions (e.g. integral, extrema), which contain 
the information about the specific structure of functions. 

The following algorithm to determine a measure for structural similarity 
between arbitrary functions f and g is proposed: 


1. A set of relevant characteristics K;,1=1,...,m, describing structural sim- 
ilarity is chosen. 

2. A fuzzy set A; labeled “admissible difference for characteristic K;’ with 
membership function u; is defined. 

3. All characteristics K,(f) for the function f and K,(g) for the function g are 
calculated. 

4. For each characteristic K; the difference AK; = | K(f) — Ke) |,i=1,..., 
m, is calculated. 

5. The degree of membership s; = u,(AK;) of the difference AK; to the fuzzy set 
A, is calculated for each characteristic K;. These membership values can be 
interpreted as similarities between functions f and g with respect to the 
chosen characteristics. 

6. Finally the vector [s;, S2,..., Sm] Of partial similarities is transformed 
using specific transformations (e.g. y-operator, fuzzy integral, minimum, 
maximum) into a real number s(f, g) expressing the overall degree of 
similarity. 

To define structural similarity between functions, the following possible charac- 

teristic values can be used: 


1. Integral 
2. Global minimum, maximum 
3. Position of minima, maxima, zeros, inflection points 
4. Number of minima, maxima, zeros, inflection points 
5. Statistical characteristics 
6. Parameters (if a family of parametric functions is under consideration) 
7. Spline parameters (if spline approximation is used) 
8. Fourier/Taylor/Wavelet coefficients 
9. Range of function values 

10. Median of function values 

11. Center of gravity. 


All these characteristic values may be calculated for the original function (tra- 
jectory) as well as for any derived function (e.g. derivatives, transformations, etc). 
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Furthermore, characteristics may be defined over the whole domain or just over 
parts of it (e.g. maximum of the first derivative in the domain 5 < x < 8). 

The definition of structural similarity as well as the choice of relevant character- 
istics can be simplified if the class of possible functions (trajectories) is restricted. 


Pointwise Similarity between Functions. Pointwise similarity between func- 
tions is concerned with the closeness of functions in the feature space and is based 
on considering functional values directly (function’s characteristics or derived 
functions are not relevant in this case). The proposed method uses similarity of 
the difference of two functions to the zero-function as a measure of similarity for 
a pair of functions. That is, similarity between functions g(x) and h(x) defined on 
the universe X is determined as similarity between the difference function f(x) = 
g(x) — h(x), x € X, and the zero-function: s(g, h) = s(g—h, 0) = s(f, 0). 

The following algorithm to determine a measure for pointwise similarity 
between an arbitrary function f(x) and the zero-function is proposed: 


1. A fuzzy set A “approximately zero” with a membership function u is defined 
(figure 13—23a). To emphasize the time focus, the variable x is taken to be 
time (t). 

2. The degree of membership u(f(x)) of the function f(x) to the fuzzy set A is 
calculated for each point x € X. These degrees of membership can be inter- 
preted as (pointwise) similarities of the function f(x) to the zero-function 
(figure 13—23b). 

3. The function u(f(x)) is eventually transformed by using specific transforma- 
tions (e.g. y-operator, fuzzy integral, minimum, maximum) into a real number 
s(f, 0) expressing the overall degree of being zero. 
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Figure 13-23. (a) The fuzzy set “approximately zero” (u(y)), the function f(t) and 
the resulting pointwise similarity u(f(t)); (b) projection of pointwise similarity into 
the plane (t, u(t(t))). 
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All similarity measures obtained with the help of this algorithm are invariant with 
respect to the addition of a function, i.e. s(g, h) = s(g + c, h + c) holds for all 
functions g, h and c. On the other hand, every similarity measure satisfying the 
above equation can be described by defining pointwise similarity between an arbi- 
trary function and the zero-function. 


The Case of Multi-dimensional Functions. The two algorithms presented 
above were formulated to determine structural and pointwise similarity between 
one-dimensional functions. The extension of these definitions for n-dimensional 
functions g(x) and h(x), x € X; x X, X...X X, is straightforward, and will be 
explained based on the algorithm for pointwise similarity. The modification of 
the algorithm can be performed in two ways: 


1. Fuzzy sets Ax, “approximately zero” are defined on each subuniverse Xj, i = 
1,...,n, and the similarity measures sx(g,h),1=1,..., n, are determined 
according to the described algorithm for projections of functions g(x) and 
h(x) on subuniverses. The result is the n-dimensional vector of similarities 
[Sx,, Sx> +- -> Sx]. 

2. The n-dimensional fuzzy set A “approximately zero” is defined on X, x X, 
X...X X, and the similarity measure sx,xx,...xx,(g, h) is obtained for n- 
dimensional functions analogously to the one-dimensional case. 


For some classification methods it could be desirable to transform the similarity 
measure into the distance measure using e.g. the relation: 


l 
d(g, h) = —— -1. (13.3) 
s(g, h) 
In the first case, when n one-dimensional fuzzy sets are given, the transforma- 
tion can be performed in two ways: 


1. The distance measure is calculated for the components of the n-dimensional 
vector [Sx,, Sx» - - - » Sx,] resulting in the vector [dx,, dx,, - - . , dx,]. The latter 
is then transformed into an overall distance using e.g. the Euclidean norm: 


d(g,h)= | $, &. 


i=l,...,n 


2. The n-dimensional vector [sx,, Sx,,..., Sx,] is transformed by using some 
transformations (e.g. ‘y-operator, fuzzy integral, minimum, maximum) into an 
overall similarity s(g, h). Thereafter the distance measure is calculated e.g. 
by (13.3). 


The obtained distance measure between n-dimensional functions g and h can 
be used as a criterion within classical methods for data analysis, allowing the 
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classification of multi-dimensional trajectories. This topic will be discussed in the 
next section in more detail. 


13.3.3 Approaches for Analysic Dynamic Systems 


In the following, two different methods for the handling of dynamics within exist- 
ing methods for data analysis are considered: 


a) During preprocessing: feature vectors containing trajectories are pre- 
processed as to become valid inputs for classical methods such as e.g. fuzzy 
c-means; 

b) Within the data analysis methods: classical methods are modified, so that 
they can process feature vectors containing trajectories directly. 


Since the modifications of the classical methods do not directly affect the way 
clusters are built, the resulting methods are basically static. But they are suited 
to process dynamic objects. Each approach is handled separately in the next two 
sections. 


Handling of Trajectories during Preprocessing. The goal of preprocessing is 
the preparation and representation of the measured data in order to make the clas- 
sification possible and improve classification results [Famili et al. 1997]. In many 
data analysis tools, methods for preprocessing are integrated [MIT Data Engine 
2.1 Manual 1997]. These methods include transformations of data such as 
calculation of the power spectrum from the time signal, computation of different 
characteristics or scaling / standardization of the data. Thus, usually preprocess- 
ing is performed along with feature selection. 

The easiest way to integrate dynamic features into existing methods for static 
data analysis is to transform trajectories into real numbers (characteristic values) 
and to use the latter instead of the original trajectories, i.e. vector valued features 
are replaced by one or more real numbers. This leads to conventional feature 
vectors, which can be processed by classical methods. This idea is illustrated in 


figure 13-24, where X,, X3, ..., Xj denote features represented as trajectories 
or vectors and C,(X;),1=1,...,L;,,j)=1,..., N, is the i-th characteristic value 
for feature j. 

It should be noted that the number L,, j = 1,..., N, and type of characteris- 


tic values can vary for different features. Since this approach does not require 
any modifications of the classification methods used, it can very easily be used 
in conjunction with different methods for data analysis. The following approach 
requires a modification of the classification methods, but does not use any 
characteristic values. 
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Figure 13-24. Transformation of a feature vector containing trajectories into a 
usual feature vector. 


Handling of Trajectories within Data Analysis Methods. In the previous 
section, the problem of using trajectories is circumvented by reducing each tra- 
jectory to a vector of characteristic values. 

In the following, another approach to handle dynamics is proposed, which is 
based on similarity between functions. First, some basic remarks related to the 
notions of distance and similarity are given. 

Many data analysis methods (e.g. fuzzy c-means [Bezdek 1981], possi- 
bilistic c-means [Krishnapuran and Keller 1993], (fuzzy-) Kohonen networks 
[Rumelhart and McClelland 1988]) use the distance between pairs of feature 
vectors describing objects as a measure of similarity between these objects. Start- 
ing with a distance d(g,h) between objects, a similarity relation can be defined 
by s(g, h) = 1/(1 + d(g,h)) [Bandemer and Näther 1992]. Conversely, each strictly 
positive similarity relation defines a distance measure d(g, h) = 1/s(g, h) — 1. 

All data analysis methods mentioned above use nothing else but the distance 
between objects and class representatives to calculate degrees of membership of 
objects to classes. The positions of objects in the feature space are used to deter- 
mine representatives of each class. Therefore, it is sufficient to provide a distance 
for pairs of objects and / or class representatives to be able to calculate degrees 
of class membership. These considerations were used to develop a modified 
version of the fuzzy c-means algorithm, which is called the functional fuzzy c- 
means (FFCM) and is able to classify dynamic objects (i.e. objects described by 
trajectories). Since the features are trajectories, the class centers calculated by the 
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Figure 13-25. Input and output of the functional fuzzy c-means. 


FFCM are not just points in the feature space, as in classical fuzzy c-means, but 
consist themselves of trajectories. This idea is illustrated in figure 13—25, where 
for the sake of simplicity objects are represented by only one feature. 

The functional fuzzy c-means algorithm (FFCM) is very similar to the stan- 
dard fuzzy c-means (FCM). In the following we present the FFCM and point at 
the differences to the FCM. 

The problem of finding fuzzy clusters of trajectories in the feature space can 
be formulated as the minimization of an objective function J(B, U;X ) of the form 


(BU: X) =>. ¥ uy)" d? (x, bi) 


i=] j=l 


with the following parameters 


c = number of clusters 

N = number of objects 

m = fuzzifier (weighting exponent) 

Ly = degree of membership of object j to class 1 


d’(x;, bi) = distance between object j and the class center of class i 
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x; = feature vector describing object j 
b; = class center of class i 


It should be noted that in the case of the FFCM the components of the feature 
vector of object x; and of class center b; are trajectories in the feature space. The 
distance measure is used for the calculation of d’(x,, bi). 

The algorithm for solving the described problem consists of the 
following steps: 


1. Initialization 
Generate values u; fori=1,...,c andj=1,...,N such that 


Su, =1 Vj =1,...,N 


i=l 


2. Determination of class centers b; 

N 

» Wy) Xj 
-Æ 


bi T yN ’ 
S wy " 
jel 


i=1,...,¢c, 


Remark: The product and the sum are calculated for each component of each 
trajectory of the feature vectors. 


3. Recalculation of membership values 1, This is the main difference between 
the FFCM and the FCM. The FFCM calculates the distances d;; and dy using 
the distance measure 


1 
i=1,...,c,j=l,...,N 


Uj; = 2 b) 


gy 
gat \ dij 
4. Stopping criterion: There exist many possible stopping criteria. One is to 


repeat steps 2 to 4 until the changes in the membership values between two 
iterations are smaller than a fixed threshold. 


Examples of applying the FFCM to managerial and to engineering problems will 
be shown in chapter 15. 
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13.4 Tools for Fuzzy Data Analysis 
13.4.1 Requirements for FDA Tools 


In section 13.2, three classes of methods, primarily for classifier design and clas- 
sification, were described in various degrees of detail. Each of these classes con- 
tains numerous methods, the suitability of which depends on the structure of the 
problem to be solved. In addition, and not described here, one needs methods for 
feature analysis such as fuzzy regression analysis, fuzzy discriminant analysis, 
etc. (for more details, see, for example, Bezdek and Pal [1992]). In other words, 
the tools needed for FDA are much more heterogeneous than those needed for 
fuzzy control as described in chapter 11. 

One of the most serious problems is that very often one only knows which 
tool is the most suitable one after the problem has been solved. Only general 
guidelines are known, such as: If the shape and the number of patterns one is 
looking for is known, then an appropriate cluster method might best be employed. 
If the knowledge is available as expert knowledge but not mathematically, then 
a knowledge-based approach might be the best. And if this information is hidden 
in a large mass of available data, then an ANN might be trainable to solve the 
problem. 

The only possible way, then, to perform FDA efficiently is to have a variety 
of methods readily available on a computer in order to find out by an intelligent 
trial-and-error method which of the methods is best suited to a specific case. This 
approach, however, amounts to having case tools similar to those already 
described for fuzzy control in chapter 11. There are only two differences: (1) 
Instead of only a shell for knowledge-based inference, now the methods of all 
three groups described in section 13.2. have to be induced, and (2) since the input 
data themselves are often the object of analysis and since they often are not in a 
suitable form to be analyzed, methods for data preprocessing also have to be 
included. 


Data Preprocessing. If, for example, in quality control some acoustic signals 
have to be investigated, it becomes necessary to filter these data in order to over- 
come the problems of noisy input. In addition to these filter methods, some trans- 
formations of the measured data such as, for example, fast Fourier transformation 
(FFT) could improve the respective results. Both filter methods and FFT belong 
to the class of signal processing techniques. Data preprocessing includes signal 
processing and also conventional statistical methods. 

Statistical approaches could be used to detect relationships within a data set 
describing a special kind of application. Here correlation analysis, regression 
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analysis, and discrimination analysis can be applied adequately. These methods 
could be used, for example, to facilitate the process of feature extraction. If, say, 
two features from the set of available features are highly correlated, it could be 
sufficient for a classification to consider just one of these. 

The differences between an FC tool and an FDA tool are probably responsi- 
ble for the fact that hardly any FDA tools are yet available on the market. In the 
following section, we briefly describe the only one known so far. 


13.4.2 DataEngine 


DataEngine is a software tool that contains methods for data analysis described 
above (see figure 13—26). In particular, the combination of signal processing, 
statistical analysis, and intelligent systems for classifier design and classification, 
leads to a powerful software tool that can be used in a very broad range of 
applications. 


CDatarnging> 


Data Preprocessing 


Algorithmic 
Data Analysis 














File 

Serial Ports 
Data Acquisition Boards 
Printer 

2D and 3D Graphics 


File 
Serial Ports 
Data Acquisition Boards 
Data Editor 

Data Generator 






Knowledge Based 
Data Analysis 














Neural 
Data Analysis 





C++ Precompiler for 


Hardwareplatforms: 

- IBM-Compatible (MS Windows) 
- Sun SPARC II (MOTIF) 

- other platforms 


e Algorithmic Classifiers 
e Rulebased Systems 
e Neural Nets 





Structure 
User Interface: Output 
- graphical Programming 


- interactive and automatic modes 


Figure 13—26. Structure of DataEngine. 
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DataEngine is written in an object-oriented concept in C++ and runs on all 


usual hardware platforms. Interactive and automatic operation supported by an 
efficient and comfortable graphical user interface facilitates the application 
of data analysis methods. In general, such applications are performed in the 
following three steps: 


1. 


Modeling a specific application with DataEngine. Each subtask in an 
overall data analysis application is represented by a so-called function block 
in DataEngine. Such function blocks represent software modules that are 
specified by their input interfaces, output interfaces, and function. Examples 
include a certain filter method or a specific cluster algorithm. Function blocks 
could also be hardware modules such as neural network accelerator boards. 
This leads to a very high performance in time-critical applications. 
Classifier design (off-line data analysis). After having modeled the appli- 
cation in DataEngine, off-line analysis has to be performed with given data 
sets to design the classifier. This task is done without process integration. 
Classification. Once the classifier design is finished, the classification of new 
objects can be executed. Depending on specific requirements, this step can 
be performed in an on-line or off-line mode. If data analysis is used for deci- 
sion support (e.g., in diagnosis or evaluation tasks), objects are classified off- 
line. Data analysis could also be applied to process monitoring and other 
problems where on-line classification is crucial. In such cases, direct process 
integration is possible by the configuration of function blocks for hardware 
interfaces (see figure 13—27). | 


DataEngine provides the following models for intelligent data analysis: 


Fuzzy Rule Base 

Fuzzy Rule Bases allow the representation of linguistic human knowledge in 
a computer. The fuzzy inference procedure is able to reproduce human deci- 
sion behavior. Applications are knowledge-based diagnosis, classification 
tasks, control, and process modeling. Especially for data analysis tasks the 
DataEngine implementation offers a multistage inference procedure as well 
as the ability to work with symbolic variables, too. 


Multilayer Perceptron 


The multilayer perceptron is a supervised learning neural network. Appli- 
cations are classification tasks, process modeling and control. In addition 
to the backpropagation learning rule with momentum and decay, DataEngine 
provides the quickpropagation learning rule. A configurable learning 
rate decay is implemented to avoid the overfitting of the neural network. 
The integrated pruning algorithm supports finding the optimal network 
architecture. 
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Figure 13-27. Screen shot of DataEngine. 


e Kohonen Feature Map 
The self-organizing feature map of Kohonen is a unsupervised learning neural 
network, which learns the structures inside the presented data. Applications 
of this neural network are classification tasks and knowledge discovery. Espe- 
cially for classification tasks DataEngine provides an example-based labeling 
algorithm, knowledge discovery is supported by the graphical visualization of 
the feature map. 

e Fuzzy C-Means 
The fuzzy c-means algorithm [Bezdek 1981] is a fuzzy clustering algorithm. 
Its applications are clustering and classification tasks. Especially for classifi- 
cation tasks DataEngine provides an example-based labeling algorithm. 
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e Fuzzy Kohonen Network 
The Fuzzy Kohonen Network is a synthesis of Kohonen’s feature 
maps and the fuzzy c-means algorithm. The use of a slightly modified fuzzy 
c-means inside the training algorithm of the network dramatically reduces 
training times. Applications of this method are clustering and classification 
tasks. 


Each of these methods comes along with its own specialized editor. The 
editors offer simple and fast access to all parameters of the model and the 
model state can be visualized in several specialized views. All editors are 
structured similarly so that the training period for a new method is reasonably 
short. 

In addition to the provided models DataEngine supplies signal processing 
functions such as the fast fourier transformation, smoothing and digital filtering, 
Statistical and mathematical functions as well as a spreadsheet-based data editor 
support data preprocessing. The so-called cards represent a graphical macro 
language that can be used for the automation of tasks carried out repeatedly. 
DataEngine 2.1 is fully integrated into the Microsoft Windows environment and 
thus provides features like data exchange via the clipboard and makes full use of 
the Microsoft Windows printing capabilities. 

The software package is extendible by so called user-defined function 
blocks. A user-defined function block is a special Microsoft Windows DLL 
(Dynamic Link Libraries) which has to conform to the DataEngine PlugIn 
interface. 

There are three third party plug-ins available for DataEngine, which use the 
interface described in the previous section. Find here a short description of these 
products: 


e FeatureSelector Plugin 
The FeatureSelector PlugIn is a tool for automatic feature selection in case of 
classification tasks. Given a number of examples, the FeatureSelector searches 
for the most significant set of features which solve your classification task. 
For the best solutions the tool generates appropriate training data files for 
DataEngine. 

e Advanced Clustering Library PlugIn 
The Advanced Clustering Library PlugIn provides nine additional clustering 
algorithms for DataEngine. The package contains the clustering algorithms 
Gustafson-Kessel, Gath-and-Geva and Fuzzy C-Means, which are imple- 
mented in several variations (probabilistic, possibilistic, parallel to axis). 
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13.5 Applications of FDA 


Applications of data analysis are abound. Recently, fuzzy data analysis of various 
kinds has been applied to character recognition [Shao and Wu 1990], intelligence 
[Guo and Zhang 1990], market segmentation, and many other areas. Here, two 
applications shall be described in which the tool described above has been used. 


13.5.1 Maintenance Management in Petrochemical Plants 


Problem Formulation. Over 97% of the worldwide annual commercial pro- 
duction of ethylene is based on thermal cracking of petroleum hydrocarbons with 
steam. This process is commonly called pyrolysis or steam cracking. Naphtha, 
which is obtained by the distillation of crude oil, is the principal raw ethylene 
material. Boiling ranges, densities, and compositions of naphtha depend on crude 
oil quality. 

Naphtha is heated in cracking furnaces up to 820°C—840°C, where the chem- 
ical reaction starts. The residence time of the gas stream in the furnace is deter- 
mined by the severity of the cracking process. The residence time for low severity 
is about 1s and for high severity 0.5s. The severity of the cracking process spec- 
ifies the product distribution. With high-severity cracking, the amount of ethyl- 
ene in the product stream is increased and the amount of propylene is decreased 
significantly. 

During the cracking process, acetylenic, diolefenic, and aromatic compounds 
are also produced, which are known to deposit coke on the inside surfaces of the 
furnace tubes. This coke layer inhibits heat transfer from the tube to the process 
gas, and therefore at some time the furnace must be shut down to remove 
the coke. To guarantee a continuous run of the whole plant, several furnaces are 
parallel integrated into the production process. The crude on-line measured 
process data is not suitable for determining the degree of coking. About 20 
different measurements of different indicators, such as temperatures, pressures, 
or flows, are taken every minute. On the basis of these data only, it is not 
possible for the operator to decide whether the furnace is coked or not. His or 
her experience and the running time of the regarded furnace is the basis for this 
decision. 


Solution by Data Analysis. Clustering methods compress the information in 
data sets by finding classes that can be used for classification. Similar objects are 
assigned to the same class. In the present case, “objects” are different states of a 
cracking furnace during a production period. Objects are described by different 
features. Features are the on-line measured quantities, such as temperatures, etc. 
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Figure 13—28 shows the structure of the cracking furnace under considera- 
tion. Features describing the process are primarily temperatures and flows. The 
classes are “coked state” and “decoked state.” Fuzzy cluster methods were used 
to determine the coking of 10 cracking furnaces of a thermal cracker. The 
data of one year have been analyzed. The process of coking lasts about 60 
days. Therefore only mean values of a day of the measured quantities were 
considered. For different furnaces, the centers of coked and decoked classes 
were found by searching for coked and decoked states in the data set. 
Figure 13-29 shows the temperature profile of a furnace during the whole 
year. Characteristic peaks, where temperature decreases significantly, result 
from decoking processes. K1 and K2 describe decoked and coked states of the 
furnace. 

The temperature profile shows no characteristic shape that results from coking. 
Furnace temperature is only one of the features shown in figure 13—29. There are 
dependencies between features, so a determination of coking through considera- 
tion of only the feature “temperature” is not possible. 

Figure 13—30 shows the membership values of a furnace state during a pro- 
duction period using the classifier. The values describe the membership of the 
current furnace state in the coked class. The membership values increase contin- 
uously and reach nearly 1 at the end of the production period. 

The classifier works on-line and classifies the current furnace state with ref- 
erence to the coking problem. The operator can use this information to check how 
long the furnace under consideration will be able to run until it has to be decoked. 
As a result, it becomes easier to make arrangements concerning logistical ques- 
tions, e.g., ordering the correct amounts of raw material or not being understaffed 
at certain times. 


13.5.2 Acoustic Quality Control 


In acoustic quality control, many efforts have been undertaken to automate the 
respective control tasks that are usually performed by humans. 

Even if there are many computerized systems for automatic quality control via 
analysis of acoustic signals, some of the problems cannot be solved adequately 
yet. Below, an example of acoustic control of ceramic goods is presented to show 
the potentials of fuzzy data analysis in this respect. 


Problem Formulation. In cooperation with a producer of tiles, a prototype has 
been built that shows the potentials of automatic quality control. At this point, an 
employee of this company has to check the quality of the final product by hitting 
it with a hammer and deciding about the quality of the tile based on the 
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Figure 13-28. Cracking furnace. 
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Figure 13-29. Furnace temperature. 
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Figure 13-30. Fuzzy classification of a continuous process. 
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resulting sound. Since cracks in the tile cause an unusual sound, an experienced 
worker can distinguish between good and bad tiles. 


Solution Process. In this application, algorithmic methods for classifier design 
and classification were used to detect cracks in tiles. In the experiments, the tiles 
are hit automatically, and the resulting sound is recorded via a microphone and 
an A/D-converter. 

Then signal processing methods like filtering and fast Fourier transformations 
(FFI) transform these sound data into a spectrum that can be analyzed. For 
example, the time signal is transformed by an FFT into the frequency spectrum. 
From this frequency spectrum, several characteristic features are extracted that 
could be used to distinguish between good and bad tiles. The feature values are 
the sum of amplitude values in some specified frequency intervals. In the exper- 
iments, a six-dimensional feature vector showed best results. After this feature 
extraction, the fuzzy c-means algorithm found fuzzy classes that could be inter- 
preted as good and bad tiles. Since a crisp distinction between these two classes 
is not always possible, fuzzy cluster techniques have an advantage: not only do 
they distinguish bad from good tiles but also the intermediate qualities can be 
defined (see figure 13-31). 
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Exercises 


1. Describe three example problems from the areas of engineering and 
management, each of which can be considered as a problem of pattern 
recognition. 

How is the dimensionality of the data space reduced in pattern recognition? 

What is the center of a cluster and how can it be defined? 

Which basic types of objective-function algorithms exist in cluster analysis? 

Consider the following fuzzy graph: 

Determine the clusters of the graph in dependence of the T-degree (cf. figure 

13-6). 

6. Let X= {x, X2, x3, X4} and let each x; be a point in three-dimensional space. 
Determine all 3-partitions that are possible and display them as shown in 
example 13-1. 

7. Give three possible fuzzy three-partitions for the problem given in exercise 
6. 

8. Let X= {(1, 1), (1, 3), (10, 1), (10, 3), (5, 2)} be a set of points in the plane. 
Determine a crisp 3-partition that groups together (1, 3) and (10, 3) and that 
minimizes the Euclidean norm metric. Do the same for the variance 
criterion. 

9. Determine the cluster validity of the clusters shown in figures 13—11 and 
13-12 by computing the partition coefficient and the partition entropy. 


we wh 





1 4 DECISION MAKING IN 
FUZZY ENVIRONMENTS 


14.1 Fuzzy Decisions 


The term decision can have very many different meanings, depending on whether 
it is used by a lawyer, a businessman, a general, a psychologist, or a statistician. 
In one case it might be a legal construct, and in another a mathematical model; 
it might also be a behavioral action or a specific kind of information processing. 
While some notions of a “decision” have a formal character, others try to describe 
decision making in reality. 

In classical (normative, statistical) decision theory, a decision can be charac- 
terized by a set of decision alternatives (the decision space); a set of states of 
nature (the state space); a relation assigning to each pair of a decision and state 
a result; and finally, the utility function that orders the results according to their 
desirability. When deciding under certainty, the decision maker knows which state 
to expect and chooses the decision alternative with the highest utility, given the 
prevailing state of nature. When deciding under risk, he does not know exactly 
which state will occur; he only knows a probability function of the states. Then 
decision making becomes more difficult. We shall restrict our attention to 
decision making under certainty. In this instance, the model of decision making 
is nonsymmetric in the following sense: The decision space can be described 
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either by enumeration or by a number of constraints. The utility function orders 
the decision space via the one-to-one relationship of results to decision alterna- 
tives. Hence we can only have one utility function, supplying the order, but we 
may have several constraints defining the decision space. 


Example 14-1 


Let us assume that the board of directors wants to determine the optimal 
dividend. Their objective function (utility function) is to maximize the dividend. 
The constraint defining the decision space is that the dividend be between zero 
and 6%. Hence the optimal dividend is “Between 0 and 6%” and “maximal.” 
(The constraint does not impose an order on the decision space!) The optimal 
dividend will obviously be 6%. Assigning a linear utility function, figure 14-1 
illustrates these relationships. 

In 1970 Bellman and Zadeh considered this classical model of a decision and 
suggested a model for decision making in a fuzzy environment that has served 
as a point of departure for most of the authors in “fuzzy” decision theory. They 
consider a situation of decision making under certainty, in which the objective 
function as well as the constraint(s) are fuzzy, and argue as follows: The fuzzy 
objective function is characterized by its membership function, and so are the 
constraints. Since we want to satisfy (optimize) the objective function as well as 
the constraints, a decision in a fuzzy environment is defined by analogy to 
nonfuzzy environments as the selection of activities that simultaneously satisfy 







Utility 


Constraint 


0 1 2 3 4 5 6 Dividend (%) 


Optimal 
Decision 


Figure 14-1. A classical decision under certainty. 
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objective function(s) and constraints. According to the above definition and 
assuming that the constraints are “noninteractive,” the logical “and” corresponds 
to the intersection. The “decision” in a fuzzy environment can therefore be viewed 
as the intersection of fuzzy constraints and fuzzy objective function(s). The rela- 
tionship between constraints and objective functions in a fuzzy environment is 
therefore fully symmetric, that is, there is no longer a difference between the 
former and the latter. 

This concept is illustrated by the following example [Bellman and Zadeh 1970, 
B-148]: 


Example 14-2 


Objective function “x should be substantially larger than 10,” characterized by 
the membership function 


0 x <10 
o(x) = a-l 
nem law) x>10 


Constraint “x should be in the vicinity of 11,” characterized by the member- 
ship function 


-1 
Le(x) =(1+(x-11)') 
The membership function us(x) of the decision is then 


Ma(x) =Ho(x) A elx) 


u(x) = {mnt +(x - 10)”) (1 +(x-1 D) } forx>10 
0 for x <10 
-[t+@—w') for x >11.75 


0 for 10<x<11.75 
for x <10 


This relation is depicted in figure 14-2. Let us now modify example 14-1 
accordingly. 


Example 14-3 


The board of directors is trying to find the “optimal” dividend to be paid to the 
shareholders. For financial reasons this dividend ought to be attractive, and for 
reasons of wage negotiations it should be modest. 
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Objective Function 


Decision 





0 5 10 15 x 


Figure 14-2. A fuzzy decision. 


The fuzzy set of the objective function “attractive dividend’ could, for 
instance, be defined by: 


1 x 25.8 


1 
uo(x) = zio —366x° -877x +540] 1<x<5.8 


0 x<l 


The fuzzy set (constraint) “modest dividend” could be represented by 


1 x<1.2 


1 
We(x) = 100629" — 243x? +16x+2,388] 1.2<x<6 
0 x26 


The fuzzy set “decision” is then characterized by its membership function 
Ho (x) = min{uo(x), we(x)} 


If the decision maker wants to have a “crisp” decision proposal, it seems appro- 
priate to suggest the dividend with the highest degree of membership in the fuzzy 
set “decision.” Let us call this “maximizing decision,” defined by 


Xmax = arg (max min {j19(x), He(2)}} 


Figure 14—3 sketches this situation. 
After these introductory remarks and examples, we shall formally define a 
decision in a fuzzy environment in the sense of Bellman and Zadeh. 
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Figure 14—3. Optimal dividend as maximizing decision. 


Definition 14-1 [Bellman and Zadeh 1970, B-148] 


Assume that we are given a fuzzy goal G anda fuzzy constraint Cina space of 
alternatives X. Then G and C combine to form a decision, D, which is a fuzzy 


set resulting from intersection of G and C. In symbols, D = G N Č, and 
correspondingly, 


Ha = min (Me, Ue} 


More genera y suppose that we have n goals Gi.. ., G, and m constraints 
Ci.. Cm Then the resultant decision is the intersection of the given goals 
Gi, .. .G, and the given constraints C Ibe , Cm. That is, 


~ 


B=G.NGN...NENGNGN...né, 
and correspondingly 


Us = min (We, Us.. Uó Was Wa>---»He, St 
= min {u6 , Me, } = min {p;} 
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Definition 14—1 implies essentially three assumptions: 


1. The “and” connecting goals and constraints in the model corresponds to the 
“logical and.” 

2. The logical “and” corresponds to the set-theoretic intersection. 

3. The intersection of fuzzy sets is defined in the possibilistic sense by the 
min-operator. 


Bellman and Zadeh indicated in their 1970 paper that the min-interpretation 
of the intersection might have to be modified depending on the context. “In short, 
a broad definition of the concept of decision may be stated as: Decision = Con- 
fluence of Goals in Constraints” [Bellman and Zadeh 1970, B-149]. 

The question arises whether even the intersection interpretation is a generally 
acceptable assumption or whether “confluence” has to be interpreted in an even 
more general way. Let us consider the following example. 


Example 14-4 


An instructor at a university must decide how to grade written test papers. Let 
us assume that the problem to be solved in the test was a linear programming 
problem and that the student was free to solve it either graphically or using 
the simplex method. The student has done both. The student’s performance is 
expressed—for graphical solution as well as for the algebraic solution—as the 
achieved degree of membership in the fuzzy sets “good graphical solution” (G) 
and “good simplex solution” (S), respectively. Let us assume that he reaches 


ug=0.9 and u;=0.7 


If the grade to be awarded by the instructor corresponds to the degree of mem- 
bership of the fuzzy set “good solutions of linear programming problems” it 
would be quite conceivable that his grade Up could be determined by 


u zæ = max {uç, usł = max {0.9,0.7} = 0.9 


The two definitions of decisions—as the intersection or the union of fuzzy sets— 
imply essentially the following: The interpretation of a decision as the intersec- 
tion of fuzzy sets implies no positive compensation (trade-off) between the 
degrees of membership of the fuzzy sets in question, if either the minimum or 
the product is used as an operator. Each of them yields a degree of membership 
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of the resulting fuzzy set (decision), which is on or below the lowest degree of 
membership of all intersecting fuzzy sets (see example 14-3). 

The interpretation of a decision as the union of fuzzy sets, using the max- 
operator, leads to the maximum degree of membership achieved by any of the 
fuzzy sets representing objectives or constraints. This amounts to a full compen- 
sation of lower degrees of membership by the maximum degree of membership 
(see example 14-4). 

Observing managerial decisions, one finds that there are hardly any decisions 
with no compensation between either different degrees of goal achievement or 
the degrees to which restrictions are limiting the scope of decisions. The com- 
pensation, however, rarely ever seems to be “complete,” as would be assumed 
using the max-operator. It may be argued that compensatory tendencies in human 
aggregation are responsible for the failure of some classical operators (min, 
product, max) in empirical investigations. 

Two conclusions can probably be drawn: Neither the noncompensatory “and” 
represented by operators that map between zero and the minimum degree of 
membership (min-operator, product-operator, Hamacher’s conjunction operator 
[definition 3—15], Yager’s conjunction operator [definition 3—16]) nor the fully 
compensatory “or” represented by the operators that map between the maximum 
degree of membership and 1 (maximum, algebraic sum, Hamacher’s disjunction 
operator, Yager’s disjunction operator) are appropriate to model the aggregation 
of fuzzy sets representing managerial decisions. 

“Confluence of Goals and Constraints” should therefore be interpreted as in 
definition 14—2. 


Definition 14-2 


Let u(x), i= 1,..., m, x € X, be membership functions of constraints, defin- 
ing the decision space, and let Ue (x), jJ=1,...,n, x E€ X be the membership 
functions of objective (utility) functions or goals. 

A decision is then defined by its membership function 


Up(x) =@; We, (x) x @ Ue, (x), i=1,..., m, j=l,...,n 


where *, @;, @; denote appropriate, possibly context-dependent “aggregators” 
(connectives). 

We shall discuss the question of appropriate connectives in more detail in 
chapter fifteen. Before we turn to fuzzy mathematical programming, it should be 
mentioned that the symmetry that is a property of all definitions based on 
Bellman-Zadeh’s concept (irrespective of the operators used) is not considered 
adequate by all authors (for example, see Asai et al. [1975]). 
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14.2 Fuzzy Linear Programming 


Linear programming models shall be considered as a special kind of decision 
model: The decision space is defined by the constraints; the “goal” (utility func- 
tion) is defined by the objective function; and the type of decision is deci- 
sion making under certainty. The classical model of linear programming can be 
stated as 


maximize f(x)=c'x 
such that Ax <b 
x20 


with c, x e R”, b e R”, A e R” (14.1) 


Let us now depart from the classical assumptions that all coefficients of A, b, and 
c are crisp numbers, that < is meant in a crisp sense, and that “maximize” is a 
strict imperative! 

If we assume that the LP-decision has to be made in fuzzy environments, quite 
a number of possible modifications of model (14.1) exist. First of all, the deci- 
sion maker might not really want to actually maximize or minimize the objective 
function. Rather, he or she might want to reach some aspiration levels that might 
not even be definable crisply. Thus he or she might want to “improve the present 
cost situation considerably,” and so on. 

Secondly, the constraints might be vague in one of the following ways: The < 
sign might not be meant in the strictly mathematical sense, but smaller violations 
might well be acceptable. This can happen if the constraints represent aspiration 
levels as mentioned above or if, for instance, the constraints represent sensory 
requirements (taste, color, smell, etc.) that cannot adequately be approximated 
by a crisp constraint. Of course, the coefficients of the vectors b or c or of the 
matrix A itself can have a fuzzy character either because they are fuzzy in nature 
or because perception of them is fuzzy. 

Finally, the role of the constraints can be different from that in classical linear 
programming, where the violation of any single constraint by any amount renders 
the solution infeasable. The decision maker might accept small violations of 
constraints but might also attach different (crisp or fuzzy) degrees of importance 
to violations of different constraints. Fuzzy linear programming offers a number 
of ways to allow for all these types of vagueness, and we shall discuss some of 
them below. 

First of all, one can either accept Bellman—Zadeh’s concept of a symmetrical 
decision model (see definition 14—1) or develop specific models on the basis of 
a nonsymmetrical basic model of a “fuzzy” decision [Orlovsky 1980; Asai et al. 
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1975]. Here we shall adopt the former, more common, approach. Secondly, one 
has to decide how a fuzzy “maximize” is to be interpreted, or whether to stick to 
a crisp “maximize.” In the latter case, complications arise on how to connect a 
crisp objective function with a fuzzy solution space. We will discuss one approach 
for a fuzzy goal and one approach for a crisp objective function. 

Finally, one has to decide where and how fuzziness enters the constraints. 
Some authors [Tanaka and Asai 1984] consider the coefficients of A, b, c as fuzzy 
numbers and the constraints as fuzzy functions. We shall here adapt another 
approach that seems to be more efficient computationally and more closely resem- 
bles Bellman—Zadeh’s model in definition 14—1: We shall represent the goal and 
the constraints by fuzzy sets and then aggregate them in order to derive a maxi- 
mizing decision. 

In both approaches, one also has to decide on the type of membership func- 
tion characterizing either the fuzzy numbers or the fuzzy sets representing goal 
and constraints. 

In classical LP, the “violation” of any constraint in model (14.1) renders the 
solution infeasible. Hence all constraints are considered to be of equal weight or 
importance. When departing from classical LP, this conclusion is no longer true, 
and one also has to worry about the relative weights attached to the constraints. 

Before we develop a specific model of linear programming in a fuzzy envi- 
ronment, it should have become clear that in contrast to classical linear pro- 
gramming, “fuzzy linear programming” is not a uniquely defined type of model; 
many variations are possible, depending on the assumptions or features of the 
real situation to be modeled. 


14.2.1 Symmetric Fuzzy LP 


Let us now turn to a first basic model for “fuzzy linear programming.” In model 
(14.1), we shall assume that the decision maker can establish an aspiration level, 
z, for the value of the objective function he or she wants to achieve and that each 
of the constraints is modeled as a fuzzy set. Our fuzzy LP then becomes: 

Find x such that 


x20 (14.2) 


Here < denotes the fuzzified version of < and has the linguistic interpretation 
“essentially smaller than or equal to.” 2 denotes the fuzzified version of = 
and has the linguistic interpretation “essentially greater than or equal to.” The 
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objective function in model (14.1) might have to be written as a minimizing goal 
in order to consider z as an upper bound. 

We see that model (14.2) is fully symmetric with respect to objective function 
and constraints, and we want to make that even more obvious by substituting 
(4) = B and (4) = d. Then model (14.2) becomes: 

Find x such that 


Bx&éd 
x20 (14.3) 


Each of the (m + 1) rows of model (14.3) shall now be represented by a fuzzy 
set, the membership functions of which are u(x). Following definition 14—1, the 
membership function of the fuzzy set “decision” of model (14.3) is 


ua (x) = min {u.(x)} (144) 


ux) can be interpreted as the degree to which x fulfills (satisfies) the fuzzy 
unequality Bx < d; (where B; denotes the ith row of B). 

Assuming that the decision maker is interested not in a fuzzy set but in a crisp 
“optimal” solution, we could suggest the “maximizing solution” to equation 
(13.4), which is the solution to the possibly nonlinear programming problem 


max min {u,;(x)} = max u p(x) (14.5) 


Now we have to specify the membership functions u(x). U(x) should be 0 if 
the constraints (including the objective function) are strongly violated, and 1 if 
they are very well satisfied (i.e., satisfied in the crisp sense); and u(x) should 
increase monotonously from 0 to 1, that is, 


1 if Bx<d, 
u;(x)= e[0,1] if d; < Bx < di + p; i=l1,...,mtl (14.6) 
0 if B;x >d; + pi 


Using the simplest type of membership function, we assume them to be linearly 
increasing over the “tolerance interval” p;: 


1 if Bx < d; 


Bx-d; . , 
uj(x)=41-— if d<Bx<d+p i=l,...,mtl (147) 
p 


0 if Bx > d; + Pi 


The p; are subjectively chosen constants of admissible violations of the con- 
straints and the objective function. Substituting equation (14.7) into problem 
(14.5) yields, after some rearrangements [Zimmermann 1976] and with some 
additional assumptions, 
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max min (i — Sana) (14.8) 
x20 i H; 


Introducing one new variable, A, which corresponds essentially to equation 
(14.4), we arrive at 
maximize A 
such that àpi + Bx < d; + p; i=1,...,m+1 
x20 (14.9) 

If the optimal solution to problem (14.9) is the vector (À, xo), then xo is the 
maximizing solution (14.5) of model (14.2), assuming membership functions as 
specified in (14.7). 

The reader should realize that this maximizing solution can be found by 
solving one standard (crisp) LP with only one more variable and one more con- 
straint than in model (14.3). Consequently, this approach is computationally very 
efficient. 

A slightly modified version of models (14.8) and (14.9), respectively, results 
if the membership functions are defined as follows: A variable t, i= 1,...,m + 


1, 0 < t; < p; is defined that measures the degree of violation of the ith constraint: 
The membership function of the ith row is then 


w(a)=1- (14.10) 


The crisp equivalent model is then 
maximize À 
such that Ap,;+t;Sp; i=1,...,m+1 
Bx- t; < d, 
ti S pi 
x,t20 (14.11) 


This model is larger than model (14.9), even though the set of constraints t; 
< p; is actually redundant. Model (14.11) has some advantages, however, in par- 
ticular when performing sensitivity analysis, which will be discussed in the 
second volume on decisions in fuzzy environments. 


Example 14-5 


A company wanted to decide on the size and structure of its truck fleet. Four 
differently sized trucks (x, through x4) were considered. The objective was to 
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minimize cost, and the constraints were to supply all customers (who have a 
strong seasonally fluctuating demand). This meant certain quantities had to be 
moved (quantity constraint) and a minimum number of customers per day had to 
be contacted (routing constraint). For other reasons, it was required that at least 
six of the smallest trucks be included in the fleet. The management wanted to use 
quantitative analysis and agreed to the following suggested linear programming 
approach: 


minimize 
41,400x, + 44,300x, + 48,100x, + 49,100x, 
subject to constraints 
0.84x, + 1.44x, + 2.16x; + 2.4x, = 170 
16x, + 16x, + 16x3 + 16x, 2 1,300 
x, 26 
Xo, X3, X4 2 O 


The solution was x, = 6, x, = 16.29, x; = 0, x, = 58.96, and Min Cost = 
3,864,975. When the results were presented to management, it turned out that the 
findings were considered acceptable but that the management would rather have 
some “leeway” in the constraints. Management felt that because demand fore- 
casts had been used to formulate the constraints (and because forecasts never turn 
out to be correct!), there was a danger of not being able to meet higher demands 
by their customers. 

When they were asked whether or not they really wanted to “minimize trans- 
portation cost,” they answered: Now you are joking. A few months ago you told 
us that we have to minimize cost; otherwise, you could not model our problem. 
Nobody knows minimum cost anyway. The budget shows a cost figure of $4.2 
million, a figure that must not be exceeded. If you want to keep your contract, 
you better stay considerably below this figure. 

Since management felt forced into giving precise constraints (because of the 
model) in spite of the fact that it would rather have given some intervals, model 
(14.3) was selected to model the management’s perceptions of the problem sat- 
isfactorily. The following parameters were estimated: 

Lower bounds of the tolerance interval: 


dı = 3,700,000 d,=170 d;=1,300 d,=6 
Spreads of tolerance intervals: 


pı = 500,000 p.=10 p,;=100 p,=6 
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After dividing all rows by their respective p;’s and rearranging in such a way 
that only À remains on the left-hand side, our problem in the form of (14.9) 
became: 

Maximize À subject to constraints 
0.083x, + 0.089x, + 0.096x3 + 0.098x, + A < 8.4 
0.084x, + 0.144x, + 0.216x; + 0.24x, — À 2 17 
0.16x, + 0.16x, + 0.16x3; + 0.16x, — À 2 13 
0.167x, -à 2 1 
À, X1, Xo, X3, X4 > 0 


The solution is as Follows: 


Unfuzzy Fuzzy 

X = X = 17.414 

X = 16.29 X = 0 

x4 = 58.96 x4 = 66.54 

Z = 3,864,975 Z = 3,988,250 
Constraints: 

1. 170 174.33 

2. 1,300 1,343.328 

3. 6 17.414 


As can be seen from the solution, “leeway” has been provided with respect to all 
constraints and at additional cost of 3.2%. 

The main advantage, compared to the unfuzzy problem formulation, is the fact 
that the decision maker is not forced into a precise formulation because of math- 
ematical reasons even though he or she might only be able or willing to describe 
the problem in fuzzy terms. Linear membership functions are obviously only a 
very rough approximation. Membership functions that monotonically increase or 
decrease, respectively, in the interval of [d;, d; + p;i] can also be handled quite 
easily, as well be shown later. 

So far, the objective function and all constraints were considered fuzzy. If 
some of the constraints are crisp, Dx < b, then these constraints can easily be 
added to formulations (14.9) or (14.11), respectively. Thus problem (14.9) would, 
for instance, become: 


342 FUZZY SET THEORY—AND ITS APPLICATIONS 


maximize A 
such that Ap,;+Bx<d,+p; i=1,...,m+1 
Dx <b 
x,r20 (14.12) 


Let us now turn to the case in which the objective function is crisp and the solu- 
tion space is fuzzy. 


14.2.2 Fuzzy LP with Crisp Objective Function 


A model in which the objective function is crisp—that is, has to be maximized 
or minimized—and in which the constraints are all or partially fuzzy is no longer 
symmetrical. The roles of objective functions and constraints are different; the 
latter define the decision space in a crisp or fuzzy way, and the former induce an 
order of the decision alternatives. Therefore the approach of models (14.3)-(14.5) 
is not applicable. The main problem is the scaling of the objective function (the 
domain of which is not normalized) when aggregating it with the (normalized) 
constraints. In very rare real cases, a scaling factor can be found that has a real 
justification. 

The problem we face is the determination of an extremum of a crisp function 
over a fuzzy domain, which we have already discussed in section 7.2 of this 
book. In definition 7—3, we defined the notion of a maximizing set that we will 
specify here and use as a vehicle to solve our LP problem. Two approaches are 
conceivable: 


1. The determination of the fuzzy set “decision.” 
2. The determination of a crisp “maximizing decision” by aggregating the 
objective function after appropriate transformations with the constraints. 


1. The Determination of a Fuzzy Set “Decision.” Orlovski [1977] suggests 
computing, for all a-level sets of the solution space, the corresponding optimal 
values of the objective function and considering as the fuzzy set “decision” the 
optimal values of the objective functions, with the degree of membership equal 
to the corresponding o-level of the solution space. 


Definition 14-3 [Werners 1984] 


Let Ra = {xx € X, U(x) = a} be the a-level sets of the solution space and M(a) 
= {x|x € Ry, f(x) = supyer, f(x’)} the set of optimal solutions for each o-level set. 
The fuzzy set “decision” is then defined by the membership function 
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sup a if xe U MQ) 
opt (x) = xeN(a) 070 


0 else 


The fuzzy set “optimal values of the objective function” has the membership 
function 


sup Uo p(x) if reR a f'(n4¥O 
Uy (r) = 4 xef! (r) 
0 else 


f(x) is the objective function with functional values r. 

For the case of linear programming, the determination of the r’s and U,)(x) 
can be obtained by parametric programming [Chanas 1983]. For each a, an LP 
of the following kind would have to be solved: 


maximize f(x) 
such that Q<u,(x) i=1,...,m 
xeX (14.13) 


The reader should realize, however, that the result is a fuzzy set and that the deci- 
sion maker would have to decide which pair (r, U;f(7)) he or she considers optimal 
if he or she wants to arrive at a crisp optimal solution. 


Example 14-6 [Werners 1984] 
Consider the LP-Model 


maximize z= 2x,+ Xx 
such that x, $3 
X,+% 24 
5x, +x, $3 
X1,%» 20 


The “tolerance intervals” of the constraints are p; = 6, p = 4, p; = 2. 
The parametric linear program for determining the relationships between f(x) 
= r and degree of membership is then 


maximize z= 2x +x% 
such that x, < 9 — 6 
x +x <8- 4a 

5x, +x < 5- 20 


X15 X2 > 0 
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Figure 14-4. Feasible regions for u(x) = O and p(x) = 1. 


Figure 14-4 shows the feasible regions for Rọ and R, for ug(x) = O and p(x) = 
1. Figure 14-5 shows the resulting membership function u(r). Additionally, 
figure 14-5 shows the membership function of the goal and the fuzzy decision 
that will be discussed below. 

Obviously, the decision maker has to decide which combination (r u,(r)) he 
or she considers best. 

Decision aids in this respect either can be derived from external sources or 
may depend on the problem itself. In the following, we shall consider an approach 
that suggests a crisp solution dependent on the solution space. 


2. The Determination of a Crisp Maximizing Decision Some authors 
[Kickert 1978; Nguyen 1979; Zadeh 1972] suggest approaches based on the 
notion of a maximizing set, which seem to have some disadvantages (see Werners 
[1984]). We shall therefore present a model that is particularly suitable for the 
type of linear programming model we are considering here. Werners [1984] sug- 
gests the following definition. 
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K ua 


0,5 


My 


5 10 15 r 


Figure 14-5. Fuzzy decision. 


Definition 14-4 


Let f: X > R' be the objective function, Ra fuzzy region (solution space), and 
S(R) the support of this region. The maximizing set over the fuzzy region, MR( f), 
is then defined by its membership function 


0 if fœ<inf f 
S(Ř) 
f&œ)- inf f 
Urp (x) = sup f —inf F if inf f< f&œ)< sup f 
S(Ř) S(R) 
1 if sup f< f(x) 
S(R) 


The intersection of this maximizing set with the fuzzy set “decision” (figure 14—5) 
could then be used to compute a maximizing decision xp as the solution with the 
highest a degree of membership in this fuzzy set. It does not seem reasonable that 
the judgment of the decision maker is calibrated by looking at the smallest value 
of f over the feasible region. A better benchmark would be the largest value for 
f that can be obtained at a degree of membership of 1 of the feasible region. This 
leads to the following definition. 
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Definition 14-5 [Werners 1984] 


Let f: X > R' be the objective function, R= fuzzy feasible region, S(R) = support 
of R, and R, = a-level cut of R for a = 1. The membership function of the goal 
(objective function) given solution space R is then defined as 


0 if f(x)< sup f 
f(x)-sup f | 

le (x) = sup sup F sup f < f(x) < sup f 
I Gt sup f < f(x) 


The corresponding membership function in functional space is then 


sup Ug(x) if reR, f'N4#O 
Ue(r): = xef(r) 


0 else 


Example 14-7 
Consider the model of example 14—6. For this model, R; is the region defined by 
x, <3 
Xit xX S 4 
5x, +x. <3 
x20 
The supremum of f over this region is 


sup 2x, +x, =7 
Ri 


Figure 14-5 shows the membership functions u;(r) and g(r). Using the min-max 
approach, the resulting solution is xf = 5.84, x3 = .05, rọ = 11.73, and the attained 
degree of membership is ug(xo) = .53. 

Let us now return to model (14.2) and modify it by considering the objective 
function to be crisp and by adding a set of crisp constraints Dx < b’: 


maximize f(x)=c’x 
such that Ax Sb 
Dx<b’?R 
x20 (14.14) 
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Let the membership functions of the fuzzy sets representing the fuzzy constraints 
be defined in analogy to equation (14.7) as 


1 if Å; xX < b; 


b; + Pi — Å;x 


UU; (x) = if b; < A;x < b; + p; 


L 


0 if Ajx>b;,+p; (14.15) 


The membership function of the objective function (14.5) can be determined by 
solving the following two LPs: 


maximize f(x)=c'x 
such that Ax <b 
Dx < b’ 
x20 (14.16) 
yielding supe, f = (CX) = fi; and 
maximize f(x)=c'x 
such that Axsb+p 
Dx <b’ 
x20 (14.17) 


yielding sups@f = (C"X)opt = fo. 
The membership function of the objective function is therefore 





1 if fosC'x 
c'x—f, 
„(x)= if <cix< 
Ue B-f fi fo 
0 if cx < fı (14.18) 


Now we have again achieved “symmetry” between constraints and the objec- 
tive function, and we can employ the approach we used to derive model (14.9) 
as an equivalent formulation of model (14.2). 

The equivalent model to (14.6) is 
maximize À 
such that Alh- fi)-c"x<s<-f, 
àp+Ax<b+p 
Dx < b’ 
AS 1 
A, x20 (14.19) 
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Example 14-8 


We shall again consider the model in example 14-6. In example 14-7, we have 
computed fi = 7. By solving problem (14.17), we obtain fọ = 16. Therefore 
problem (14.19) is 


maximize À 
such that 9A — 2x, — x < -7 
6A+x<9 
4AN+X,+% <8 
2A + 5x, +x. <5 
NS 1 
A, X1, X20 


The solution to this problem is x? = 5.84, x2 = 0, Ay = .52. 


Before we turn to fuzzy dynamic programming, it should be mentioned that on 
the basis of the approach described so far, suggestions have been published for 
a duality theory [R6dder and Zimmermann 1980], for sensitivity analysis in fuzzy 
linear programming [Hamacher, Leberling, and Zimmermann 1978], for integer 
fuzzy programming [Zimmermann and Pollatschek 1984], and for the use of other 
than linear membership functions and other operators [Werners 1988]. These 
topics will not, however, be discussed here. They have been discussed in more 
detail in Zimmermann [1987]. Other approaches introducing fuzziness into math- 
ematical programming have been published by a number of authors (see, for 
instance, [Slowinski 1998], [Wang et al. 2001], [Sakawa and Nighizaki 2001], 
[Jamison and Lodwick 2001], [Sengupta et al. 2001]). Often these approaches 
have been developed in the context of multi objective decision making. In order 
to avoid duplication, these approaches will be mentioned at the end of the dis- 
cussion of the vector-maximum problem in section 14-4. 


14.3 Fuzzy Dynamic Programming 


Traditional dynamic programming [Bellman 1957] is a technique well known in 
operations research and used to solve optimization problems that can be com- 
posed into subproblems of one variable (decision-variable) each. The idea under- 
lying dynamic programming is to view the problem as a multistage decision 
process, the optimal policy to which can be determined recursively. 

Generally the problem is formulated in terms of state variables, x,;; decision 
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toldo) tu- Ldn) 





Xn. 
Xn 
Tues 
Figure 14-6. Basic structure of a dynamic programming model. 
variables, d;; stage rewards, r; (x; di); a reward function, R(dy, . . . , dy.;, Xy), and 


a transformation function, t;(d;, x;). Figure 14—6 illustrates the basic structure. 
The problem is solved by solving recursively the following: 


max R(x;, d;) = max r(x, d;)° Ria (Xin) 


such that 
Xi =1;(x;, d;) 
j=1,...,N-1 


or 
max R(x;, di) = max {z (x;, di) ° Ris (t;(x;, d;))} 


All variables, rewards, and transformations are supposed to be crisp. 


14.3.1 Fuzzy Dynamic Programming with Crisp State Transformation 
Function 


In their famous paper, Bellman and Zadeh [1970] suggested for the first time 
a fuzzy approach to this type of problem. Conceivably they based their con- 
siderations on the symmetrical model of a decision as defined in definitions 14-1 
and 14-2. The following terms will be used to define the fuzzy dynamic pro- 
gramming model [Bellman and Zadeh 1970, B-151]: X ;E X,i=0,...,N: (crisp) 
state variable where X = {T,..., Ty} is the set of values permitted for the state 
variables; de D,i=1,..., N: (crisp) decision variable where D= {0y,..., Om} 
is the set of possible decisions. 


Xi = t(x;, d;): (crisp) transformation function 


For each stage t, t= 0,..., N— 1, we define: 
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1. a fuzzy constraint C, limiting the decision space and characterized by its 
membership function 


Le, (d;) 
2. a fuzzy goal Gy characterized by the membership function 
Wey (xn) 
The problem is to determine the maximizing decision 
D? ={d°} i=0,...,N, fora given xo 
The Model. According to definition 14—1, the fuzzy set decision is the “con- 
fluence” of the constraints and the goal(s), that is, 


N-1 


D=(\C.NGy 
Using the min-operator for the aggregation of the fuzzy constraints and the 
goal, the membership function of the fuzzy set decision is 
Ha(do,...,dyi)= min {ua (do), ..., Wey: (dna), Wey (rw )t (14.20) 
The membership function of the maximizing decision is then 
UL 50 (do, ...,dn1) = max. , max [min {ug (do), ... , Wey (tw (Xni, dn ))}] 
(14.21) 


where dẸ denotes the optimal decision on stage i. If K is a constant and g is any 
function of dy_,, we can write 


max min {g(dy_,), K} = min {K, max g(dy_,)} 


dn-1 dn-1 


and equation (14.21) can be expressed as 
LW p0(do,...,dn-1) = „Pax min {bey (do), +++ , Mën (Xn) (14.22) 
with 
Hops (€n) = max min {Hey (dwi), Hoy (tunis dw} (14.23) 


We can thus determine D’ recursively. 


Example 14-9 [Bellman and Zadeh 1970, B-153] 


Let d Ls d, be the two decision variables, the possible values of which can be @, 
o2. The state variables are x, t= 0,..., 2 with a finite range X = {%, Ta, T3}. 
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The fuzzy constraints for t = 0 and t= 1 are 


Cy(0;) = {(0, .7), (2, D} 
C,(a;) = {(01, 1), (2, .6)} 


The fuzzy goal is specified as 
G(x) = {(t1, .3), (T2, 1), (T3, .8)} 


and the crisp transformation function is defined by the following matrix: 


Solution. Using equation (14.23) we can compute the fuzzy goal induced at 
t= 1 as follows: We start at stage t = 2. The state-decision combinations that yield 
t; on state t = 1 are obtained from the above matrix. 
So we can compute: 
He, (Tı) = max {min [ua (d1), He, (ECT, 1) 
min [u (di), We, (CT, &2))]} 
= max {min [1, .3], min [.6, 1]} 
= max {.3,.6} =.6 
+d? =a, 
Ug, (t2) = max {min [1, .8], min [.6, .3]} 
= max {.8, .3} =.8 
— d? =o, 
ug, (t3) = max {min [1, .3], min [.6, .8]} 
= max {.3, .6} =.6 
>d? =Q, 
We, (Tı) = max {min [.7, .6], min [1, .8]} 
= .8 
—> dj =Q, 
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We (T2) = max {min [.7, .6], min [1, .6]} 
= .6 
— dÊ = Q, or a, 

We, (T3) = max {min [.7, .6], min [1, .6]} 
= .6 
+d =a, or Q» 


Thus for 


. JO _ 0 _ 
Xo =T ido =Q2, di =Q, 


Xo =T; : d? = 0}, d? =0, or 
di =O, d? =Q, 
with He =.6 

— . 70 _ 0 _ 

Xo = T3 : do = 0}, di =02 or 
dj =Q2, d? =0, 

both with ue =.6 


14.4 Fuzzy Multicriteria Analysis 


In the recent past, it has become more and more obvious that comparing the desir- 
ability of different means of action, judging the suitability of products, or deter- 
mining “optimal” solutions in decision problems cannot be done in many cases 
by using a single criterion or a single objective function. This area, multicriteria 
decision making, has led to numerous evaluation schemes (e.g., in the areas of 
cost-benefit analysis and marketing) and to the formulation of vector-maximum 
problems in mathematical programming. 

Two major areas have evolved, both of which concentrate on decision making 
with several criteria: Multi Objective Decision Making (MODM) and Multi 
Attribute Decision Making (MADM). The main difference between these two 
directions is that the former concentrates on continuous decision spaces, primar- 
ily on mathematical programming with several objective functions, and the latter 
focuses on problems with discrete decision spaces. There are some exceptions 
to this rule (e.g., integer programming with multiple objectives), but for our 
purposes this distinction seems to be appropriate. 

The literature on multicriteria decision making has grown tremendously in the 
recent past. We shall only mention one survey reference for each of these two 
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areas: Hwang and Yoon [1981] for MADM and Hwang and Masud [1979] for 
MODM. Fuzzy set theory has contributed to MODM as well as to MADM. We 
shall illustrate these contributions by describing one model in each of these areas. 
This topic has been treated in much more detail in the volume on fuzzy sets and 
decision analysis [Zimmermann 1987]. 


14.4.1 Multi Objective Decision Making (MODM) 


In mathematical programming, the MODM problem is often called the “vector- 
maximum” problem, and was first mentioned by Kuhn and Tucker [1951]. 


Definition 14-6 
The vector-maximum problem is defined as 
"maximize" {Z(x)| xe X} 


where Z(x) = (z,(x),..., z%(x)) is a vector-valued function of x e R” into R* and 
X is the “solution space.” 

Two stages can generally be distinguished, at least categorically, in vector- 
maximum optimization: 


1. the determination of efficient solutions, and 
2. the determination of an optimal compromise solution. 


Definition 14-7 


Let “max” {Z(x) | x € X} be a vector-maximum problem as defined in definition 
14-6. X is an efficient solution if there is no x € X such that 


z;(X) = z;(X) i=l,... „k 
and 
z;(x)>z:(x) for at least one i=1,...,k 


The set of all efficient solutions is generally called the “complete solution.” 


Definition 14-8 


An optimal compromise solution of a vector-maximum problem is a solution 
x € X that is preferred by the decision maker to all other solutions, taking into 
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consideration all criteria contained in the vector-valued objective function. It is 
generally accepted that an optimal compromise solution has to be an efficient 
solution according to definition 14-7. 


In the following, we shall restrict our considerations to the determination of 
optimal compromise solutions in linear programming problems with vector- 
valued objective functions. 

Three major approaches are known to single out one specific solution from 
the set of efficient solutions which qualifies as an “optimal” compromise 
solution: 


1. the utility approach [see, e.g., Keeney and Raiffa 1976], 
2. goal programming [see, e.g., Charnes and Cooper 1961], and 
3. interactive approaches [see, e.g., Dyer 1973] 


The first two of these approaches assume that the decision maker can specify his 
or her “preference function” with respect to the combination of the individual 
objective functions in advance, either as “weights” (utilities) or as “distance func- 
tions” (concerning the distance from an “ideal solution,” for example). Generally 
these two approaches assume that the combination of the individual objective 
functions that arrives at the compromise solution with the highest overall utility 
is achieved by linear combinations (i.e., adding the weighted individual objec- 
tive functions). The third approach uses only local information in order to arrive 
at an acceptable compromise solution. 
The following example illustrates a fuzzy approach to this problem. 


Example 14-10 


A company manufactures two products 1 and 2 on given capacities. Product 1 
yields a profit of $2 per piece and product 2 of $1 per piece. Product 2 can be 
exported, yielding a revenue of $2 per piece in foreign countries; product 1 needs 
imported raw materials of $1 per piece. Two goals are established: (1) profit 
maximization and (2) maximum improvement of the balance of trade, that 
is, maximum difference of exports minus imports. This problem can be modeled 
as follows: 


mgs —1 2V xı \ (effect on balance of trade) 
maximize” Z(x) = 
2 1⁄^x:/ (profit) 


such that 
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Figure 14-7. The vector-maximum problem. 


=x; + 3x, < 21 

xX, + 3x) < 27 

4x, + 3x, < 45 

3x, +x. < 30 
X1, X20 


Figure 14-7 shows the solution space of this problem. The “complete solution” 
is the edge x' — x” — xX — x*. x' is optimal with respect to objective function z,(x) 
= —x, + 2x, (i.e., best improvement of balance of trade). x* is optimal with respect 
to objective function z,(x) = 2x, + x (profit). The “optimal” values are z,(x') = 
14 (maximum net export) and z, (x*) = 21 (maximum profit), respectively. For x’ 
= (7; 0)’, total profit is z(x') = 7 and x* = (9; 3)’ yields z,(x*) = —3, that is, a net 
import of 3. Solution x = (3.4; 0.2)’ is the solution that yields z,(°) = —3, z.(x) 
= 7, which is the lowest “justifiable” value of the objective functions in the sense 
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that a further decrease of the value of one objective function cannot be balanced 
or even counteracted by an increase in the value of the other objective function. 

To solve problems of the kind shown in example 14—10, we can use the fol- 
lowing approach. We first assume that either the decision maker can specify aspi- 
ration levels for the objective functions or we define properties of the solution 
space for “calibration” of the objective functions. Let us consider the objective 
functions as fuzzy sets of the type “solutions acceptable with respect to objective 
function 1.” In example 14-10, we would have to construct two fuzzy sets: “Solu- 
tions acceptable with respect to objective function 1” and “solutions acceptable 
with respect to objective function 2.” As calibration points, we shall use the 
respective “individual optima” and the “least justifiable solution.” 

The membership functions u,(x) and u(x) of the fuzzy sets characterizing the 
objective functions rise linearly from 0 to 1 at the highest achievable values of 
z,(x) = 14 and z,(x) = 21, respectively. 

This means that we assume that the level of satisfaction with respect to the 
improvement of the balance of trade rises from 0 for imports of 3 units or more 
to 1 for exports of 14 and more; and the satisfaction level rises with respect to 
profit from 0 if the profit is 7 or less to 1 if total profit is 21 or more. 


0 for z(x)<-3 
U(x) = ani for —3<2z,(x)<14 

1 for 14<z (x) 

0 for zZ(x)<7 
u(x) = oL for 7<z(x)<21 

1 for 21<2z,(x) 


We are now faced with a problem of type (14.3) in which crisp constraints have 
been added (i.e., the problem consists of two rows representing our fuzzified 
objectives and four crisp constraints). We can now employ formulation (14.12). 


Example 14-10 (continuation) 


In analogy to formulation (14.12) and including the crisp constraints, we arrive 
at the following problem formulation: 
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maximize A 
such that A < -0.05882x, + 0.117x, + 0.1764 
À < +0.1429 x, + 0.714x, — 0.5 
21 2 -x + 3x, 
27 =x, + 3X, 
45 = 4x, + 3x, 
30 2 3x, + xX, 
x20, 


depicted in figure 14.8. 

The maximum degree of “overall satisfaction” (Amas = 0.74) is achieved for 
the solution x) = (5.03; 7.32)". This is the “maximizing solution,” which in our 
example yields a profit of $17.38 and an export contribution of $4.58. The basic 
solution x’ and xf yield A = 0. 





Figure 14—8. Fuzzy LP with min-operator. 
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In contrast to the usual vector-maximum models, the efficient solutions con- 
tained in the “complete solution” are ordered (distinguishable) by their degree of 
membership to the fuzzy set decision. It should be obvious that the approach 
described above can only be applied if the “symmetrical model” of a decision 
(definition 14-1) is accepted. Otherwise, we will have to use approaches applic- 
able to problem (14.13), these, however, will not be discussed in this volume. 

At the beginning of section 14.2, many simplifying assumptions were pointed 
out that are generally accepted in traditional linear programming models. These 
assumptions concerned the use of real numbers rather than fuzzy numbers for the 
coefficients of linear programming as well as the use of crisp relations rather than 
fuzzy. One approach used in section 14.2 for the fuzzification of crisp mathe- 
matical programming problems seems to be computationally very efficient, well 
applicable in practice, and understandable by practioners. In the literature, the 
reader will find numerous different approaches that, from a mathematical point 
of view, are quite interesting. It would certainly exceed the scope of this book to 
describe the majority of these suggestions. We shall, however, mention a few of 
them. The reader will find quite a number of references to other approaches in 
the bibliography at the end of the book. 

Approaches that use fuzzy sets to describe the parameter of linear program- 
ming models can be traced, in particular, to the paper by Negoita, Minouiu, and 
Stan [1976]. They use fuzzy sets to describe the parameters of the matrix A and 
the capacity vector b and then formulate for each & the respective a-cuts. The 
resulting crisp problem can then be solved by the usual LP codes. If the mem- 
bership functions have only a finite number of values, an optimal alternative and 
an objective function value can be determined for each case. This approach, 
however, is connected with a high computational effort. Afterwards the decision 
maker has to choose a desirable degree of membership and the associated solu- 
tion. Kacprzyk and Orlovski [1987], in their review article, mention a number of 
additional references in which special representations of fuzzy parameters are 
used. Here we shall mention only the work of Tanaka and Asai [1984], who use 
triangular membership functions, and Ramik and Rimanek [1985, 1989], who use 
fuzzy parameters in LR representation and replace each resulting fuzzy relation 
by four strict relations. 

Other authors consider nonlinear vector-maximum problems in which all 
parameters are defined fuzzily. Sakawa and Yano [1987], for instance, formulate 
a fuzzy nonlinear vector-maximum problem with fuzzy parameters a,, L = 1, 
..., kin the k objective functions and b, i=1,...,min the m constraints. Here 
the fuzzy parameters are regarded as real-valued fuzzy numbers. For each Q- 
degree, a crisp equivalent model can be formulated for which the values of the 
fuzzy numbers can be considered as variables subject to the condition that they 
belong to the fuzzy number at least with the degree of membership a. Sakawa 
and Jano [1987] define the notion of an Q-pareto-optimal solution in generaliz- 
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ing the classical pareto-optimality with respect to the crisp equivalent models. 
The authors suggest an interactive algorithm that leads the decision maker to a 
satisfying solution. The decision maker has to provide as starting values the 
desired & and the aspiration level for the objective function. The algorithm then 
solves an equivalent model that minimizes for a given o the deviation from the 
aspiration level and supplies additional trade-off information to the decision 
maker. This approach assumes that the decision maker can choose the states that 
are expressed in the fuzzy numbers. Therefore, this approach seems to be only 
suitable if the decision maker can really influence these values, that is, if they are 
not dependent on the environment. Because it is assumed that the parameters are 
variables, the resulting &-model is at least quadratic, even if the basic model is 
linear. 

If the fuzzy coefficients are the result of insufficient information that can be 
improved by additional effort, an optimal context-dependent allocation of addi- 
tional effort is of interest. Tanaka, Ishihashi, and Asai [1986] discuss the value 
of additional information and suggest a model for the allocation of information 
on the basis of sensitivity analysis. In the recent past, fuzzy models have also 
been suggested for fractional programming, integer programming, geometric 
programming, and other versions of mathematical programming problems. Of 
particular interest is the application of possibility theory to mathematical pro- 
gramming suggested by Buckley [1988a, 1988b], and the papers by Arikan and 
Güngör [2001], Abd El-Wahed and Abo-Sinna [2001], and Jamison and Lodwich 
[2001]. 


14.4.2 Multi Attributive Decision Making (MADM) 


The general multi attributive decision-making model can be defined as follows. 


Definition 14-9 


Let X = {x;|i=1,...,} be a (finite) set of decision alternatives and G = {g; | 
j=1,..., m} a (finite) set of goals according to which the desirability of an 
action is judged. Determine the optimal alternative x° with the highest degree of 
desirability with respect to all relevant goals g,. 

Most approaches in MADM consist of two stages: 


1. the aggregation of the judgments with respect to all goals and per decision 
alternative, and 

2. the rank ordering of the decision alternatives according to the aggregated 
judgments. 
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In crisp MADM models, it is usually assumed that the final judgments of the 
alternatives are expressed as real numbers. In this case, the second stage does not 
pose any particular problems and suggested algorithms concentrate on the first 
stage. Fuzzy models are sometimes justified by the argument that the goals, g,, 
or their attainment by the alternatives, x;, respectively, cannot be defined or judged 
crisply but only as fuzzy sets. In this case, the final judgments are also repre- 
sented by fuzzy sets, which have to be ordered to determine the optimal alterna- 
tive. Then the second stage is, of course, by far not trivial. 

In the following, we shall describe two fuzzy MADM models—the first one, 
by Yager, because it shows very clearly the general structure of the problem and 
the second, by Baas and Kwakernaak, because many of the publications refer to 
this model, which is one of the first of this kind published. 


Model 14-1! {Yager 1987] 


Let X = {x,..., Xa} be a set of alternatives. The goals are represented by the 
fuzzy sets G, j=1,...,m.The “importance” (weight) of goal j is expressed by 
w; the “attainment” of goal G, by alternative x; is expressed by the degree of 
membership Hē, (xi). 

The decision is defined in line with definition 14-1 as the intersection of all 
fuzzy goals, that is, 


D=G'nGen...N Grn 


and the optimal alternative is defined as that achieving the highest degree of mem- 
bership in D. 


The rationale behind using the weights as exponents to express the importance 
of a goal can be found in definition 9-3: There the modifier “very” was defined 
as the squaring operation. Thus the higher the importance of a goal, the larger 
should be the exponent of its representing fuzzy set, at least for normalized fuzzy 
sets and when using the min-operator for the intersection of the fuzzy goals. Yager 
concentrates on the problem of determining the weights of the goals. As a solu- 
tion to that problem, he suggests Saaty’s hierarchical procedure for determining 
weights by computing the eigenvectors of the matrix M of relative weights of 
subjective estimates [Saaty 1978]: 


The membership grade in all objectives having little importance (w < 1) becomes larger, 
and while those in objectives having more importance (w > 1) become smaller. This 
has the effect of making the membership function of the decision subset D, which is 
the min value of each X over all objectives, being more determined by the important 
objectives, which is as it should be. Furthermore, this operation (min) makes particu- 
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larly small those alternatives that are bad in important objectives, therefore when we 
select the x; that maximizes D, we will be very unlikely to pick one of these [Yager 
1978, p. 90]. 


The solution procedure can now be described as follows: Given the set X = 
{x|,...,X,} and the degrees of membership ue, (x;) of all x; in the fuzzy sets G; 
representing the goals, 


1. Establish by pairwise comparison the relative importance, 0, of the goals 
among themselves. Arrange the o, in a matrix M. 





Oy Or Oy 
a a OL, 
a2 

M =| Q, 
Q, An 
Ol; On 


2. Determine consistent weights w; for each goal by employing Saaty’s eigen- 
vector method. 

3. Weight the degrees of goal attainment, 1¢(x;) exponentially by the respec- 
tive w;. The resulting fuzzy sets are (G,(x;,))”) 

4. Determine the intersection of all (G(x): 


Ď = {(x:, mine (x) \li=1,...,n;j=1,...,m)} 


5. Select the x; with largest degree of membership in D as the optimal 
alternative. 


Example 14-11 [Yager 1978, p. 94] 


Let X = {x, X2, x3}, and let the goals be given as 


G,(x;) = {(%, .7), (x2, -5), (x3, -4)} 
Gy (x;) = {x ) 3), (x2 ; 8), (x3, .6)} 
G; (x;) = {x , 2), (x2, .3), (x3, .8)} 
Ga (x;) = {(%1, 5), (x2, D, (3, -2)} 


The subjective evaluations have resulted in the following matrix of weights: 
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G, G, G; G, 

M 3 7 9 

6jt 1 6 7 

“Giz 2 a3 
3] 7 6 

Gly 5 4 1 


Via Saaty’s method, we obtain the vector 


w={W1,W2,W3,W4} as 
w = {2.32, 1.2, .32, .16} 


Exponential weighting of G(x) by their respective weight yields 


G,(x;) = {(x1, 44), (x2, .2), (x3, -12)} 
G(x)” = {(x1, 24), (x2, .76), (x3, 54)} 
Gy(x,) = {(x1, 6), (x2, 68), (x3, .93)} 
Ga lx) = {C 9), (x2, 69), (x3, -77)} 


The fuzzy set decision D, as the intersection of the G(x), becomes 
D = {(x1, 24), (x2, -2), (x3, -12)} 


and the optimal alternative is x, with a degree of membership in G of us(x) 
= .24. 


Model 14-2 [Baas and Kwakernaak 1977] 


Let again X = {x; | i =1,..., n} be the set of alternatives and G = {g,|j = 1, 
..., m} the set of goals. r; is the “rating” of alternative i with respect to goal j, 
and w; e R' is the weight (importance) of goal j. It is assumed that the rating of 
alternative i with respect to goal j is fuzzy and is represented by the membership 
function ug, (7) on R’. 

Similarly, the weight (relative importance) of goal j is represented by a fuzzy 
set w; with membership function U,,(w;). All fuzzy sets are assumed to be nor- 
malized (i.e., have finite supports and take on the value 1 at least once!). 

Step 1. The evaluation of an alternative x; is , by contrast to model 14.1, 
assumed to be a fuzzy set that is computed on the basis of the r; and w; as follows: 
Consider a function g: R*” — R defined by 
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j=l 





m 
win; 


g(z) = (14.24) 
2w; 
j=1 
with z = (Wy, ..., Wms Fls © ©, Fm). 
On the product space R”, a membership function uy is defined as 
Ha(z)=min { min (Hw (w;), min (Wa (n) (14.25) 


Through the function g, the fuzzy set Z = (R”, U) Induces a fuzzy set R; = (R, 
ug) with the membership function 


Ue (7) = sup Hz (z) reR (14.26) 


z:g(z)=F 


ug (r) is the final rating of alternative x; on the basis of which the “rank order- 
ing” is performed in step 2. 

Step 2. For the final ranking of the x; Baas and Kwakernaak start from the 
observation that if the x; had received crisp rating r; then a reasonable procedure 
would select the x; that have received the highest rating, that is, would determine 
the set of preferred alternatives as {i € J|r,;27,, Vje J}, 7= {1,..., n}. 

Since here the final ratings are fuzzy, the problem is somewhat more compli- 
cated. The authors suggest in their model two different fuzzy sets in addition 
to R; which supply different kinds of information about the preferability of an 
alternative. 


a. They first determine the conditional set (Z | R) with the characteristic 
function 
l if 727 Viel 


14.27) 
O else \ 


pun tiIR 7) 
This “membership function” expresses that a given alternative x; belongs to the 
preferred set iff 


The final fuzzy ratings R define on R” a fuzzy set R= (R”, ug) with the mem- 
bership function 


HeC,- 7)= min gC) (14.28) 


g.er. 
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This fuzzy set together with the conditional fuzzy set (14.27) induces a fuzzy set 
I = (I, uy) with the membership function 

w= sup (min {Ugmlli,-...%),MaGis.--. A) (14.29) 

Ils... In 

which can be interpreted as the degree to which alternative x; is the best alterna- 
tive. If there is a unique i, then x; corresponds to the alternative that maximizes 
equation (14.29) if the w; and r; are set to the values at which p1;,(w;) and ug,(ry), 
respectively, attain their supremum, namely 1. 


b. This is, of course, not all the information that can be provided. x; might not 
be the unique best alternative, but there might be some x; attaining their 
maximum degree of membership at r*. They might, however, be represented 
by different fuzzy sets r;. 


Baas and Kwakernaak therefore try to establish another criterion that might 
be able to distinguish such “preferable” alternatives from each other and rank 
them: 

If the final ratings are crisp, 7;,..., Fn, then 


1 n 





for fixed i, can be used as a measure of preferability of alternative x; over all 
others. 
__ If the ratings r; are fuzzy, then the mapping h;: R” —> R induces a fuzzy set 
P, = (R, uz.) with the membership function 
He(p)= sup Upglh,..-,%) (14.30) 
hi(A,...,%J=P 
in which ug is defined by equation (14.28). 

This fuzzy set can be used to judge the degree of preferability x; over all other 
alternatives. 

The computational aspects for determining all the fuzzy sets mentioned above 
shall not be discussed here; models 1 and 2 have been described because of their 
illustrative value. Baas and Kwakernaak mention and prove special conditions 
for the membership functions to make computations possible. 

To summarize: Three kinds of informations are provided: 


1. g(r) as the fuzzy rating of xi. 
2. uz(i) as the degree to which x; is best alternative, and 
3. [Up(p) as the degree of preferability of x; over all other alternatives. 
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Example 14—12 [Baas and Kwakernaak 1977, p. 54] 


Let X = {x), X2, x3} be the set of available alternatives and G = {g), 82, 83, g4} the 
set of goals. The weights and the ratings of the alternatives with respect to the 
goals are given as normalized fuzzy sets that resemble the terms of a linguistic 
variable (see definition 9-1). Figure 14-9 depicts the fuzzy sets representing 
weights and ratings. Table 14-1 gives the assumed ratings for all alternatives and 
goals and the respective weights. Figure 14-10 shows the ug(7;) (final ratings for 
alternatives x1, X2, X3). 

The degrees of membership of the alternatives to the fuzzy set (7, up), that is, 
the degrees to which alternatives x; are best, are 


Alternative Uj(x;) 

1 95 
1 

3 17 


The fuzzy set P,(p) indicating the degree to which alternative 2 is preferred to 
all others is shown in figure 14—11. p, is calculated as p, = F, —4( F, + F3). 

Many other fuzzy methods and models have been suggested to solve the 
MADM problem. They differ by their assumptions concerning the input data and 
by the measures used for aggregation and ranking. Also, they concentrate either 
on the first step (aggregation of ratings), or the second step (ranking), or both. 
Obviously all of them have advantages and disadvantages. They will, however, 
not be discussed here but will be in the second volume. 

An interesting example of a more engineering-type application of multicrite- 
ria decision making using fuzzy sets is described by Munoz-Rodriguez and 
Cattermole [1987]. 
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Figure 14-9. Fuzzy sets representing weights and ratings. 
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Figure 14-9. Continued. 


Table 14-1. Ratings and weights of alternative goals. 


Rating T; for alternative x; 


Goal Weight oo 
8j Ww; i=] 2 3 
1 very important good very good fair 
2 moderately important poor poor poor 
3 moderately important poor fair to good fair 
4 rather unimportant good not clear fair 


FUZZY SET THEORY—AND ITS APPLICATIONS 


Alternative 2 


- 


i 
i 
/ 


é 
I 
U 


Alternative 1 


- 
- 

or" 
ae 


~ 
~ 
“ee 


„æ 
æT 
od 
pe 
p 
oo 
-— 
oF 
sæ 


Sean 
~. 


Alternative 3 


e 
„æ? 


- 
-_ 
- 
a” 


- 
wow” 
- 


æ 
er tad 
ad 
oo 
oer 
- 


~ 
~ 
~ 
an 


~ 
-~ 
- 
-a 


om, 
Tan, 
=, 


ut) 
5 


- 
o- 
-” 
Prd 


~ 
~ 
= 
ee 


—s 
we 


mee 
-a 


to. 
mee, 
“ræ, 


- 
rT 
- 


=~ me 
~~ 





0.8 


0.6 


0.4 


0.2 


Figure 14-10. Final ratings of alternatives. 
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Figure 14—11. Preferability of alternative 2 over all others. 


Exercises 


1. Explain the (mathematical) difference between the symmetric and nonsym- 
metric model of a decision in a fuzzy environment. 

2. Consider example 14—4. What grade would the student get if the “and” was 
interpreted as the “bold intersection” (definition 3—6), the “bounded differ- 
ence” (definition 3—8), or the “bold union”? 

3. Consider the following problem: 

Minimize z= 4x, + 5x + 2x; 
such that 3x, + 2x2+ 2x3 < 60 
3x, +X) + x3 < 30 
2X + X3 > 10 
Xis X2, X3 2 0 


Determine the optimal solution. Now assume that the decision maker has 
the following preferences: 
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a. He has a linear preference function for the objective function between the 
minimum and 1.5. 
b. The tolerance intervals can be established as 


pı = 10, pi = 12, p = 3 


Now use model (14.9) to determine the optimal solution and compare it 
with the crisp optimal solution. 
4. Solve the example of exercise 3 by assuming the objective function to be 
crisp and by using equation (14.18). 
5. Consider the problem: 


oo. -x — 3x, 
“maximize” Z(x)= 
1.5x, +2.5x, 
such that -x + 2x, < 18 
4x, + 3x < 40 
3x, + X> < 25 
X, X20 


Determine an optimal compromise solution by using the model from 
example 14-10 (continuation). 
6. What is the optimal alterative in the following situation (use Yager’s 
method!)? 


Alternatives: X = {X), X2, X3, x4} 

Goals: G(x) = {(x, .8), (Xx .6), (%3, .4), X4, .2)} 
Gx) = {(%1, 4), @2 -6), (x3, .6), (xa, -8)} 
G3(x;) = {(%1, .6), (x2 .8), (x3, .8), X4, .6)} 


The relative weights of the goals have been established as: G,:G,:G; = 1:4:6. 


T 5 APPLICATIONS OF 
FUZZY SETS IN ENGINEERING 
AND MANAGEMENT 


15.1 Introduction 


The scope of applications of fuzzy sets—increasingly together with neural nets— 
is very large and still growing continuously. The closer the problem is to human 
evaluation, intuition, perception, and decision making, the less dichotomous is 
the problem structure and the more relevant and promising is the application of 
fuzzy technology. 

In addition, one should realize, that we have moved from a situation of lack 
of computer readable data to a situation of an abundance of data, in which human 
beings are often unable to detect in the masses of available data the information 
that is relevant and valuable to them. Obviously there exists an increasing need 
for reduction of complexity by compactification of data. This is the reason for 
the increasing importance of (intelligent) data mining methods and tools. Web- 
technology is just opening completely new areas of application. If a model of a 
real problem does not consist of crisply defined mathematical statements and rela- 
tions—if it is, for instance, a verbal model or a model containing fuzzy sets, fuzzy 
numbers, fuzzy statements, or fuzzy relations—then traditional mathematical 
methods cannot be applied directly. Either fuzzy algorithms—that is, algorithms 
that can deal with fuzzy entities or algorithms the procedure of which is “fuzzily” 
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described—can be applied or one has to find crisp mathematical models that are 
in some specific sense equivalent to the original fuzzy model and to which avail- 
able crisp algorithms can then be applied. 

All cases in which fuzzy set theory is properly used as a modeling tool are 
characterized by four features: 


1. Fuzzy phenomena, relations, or evaluations are modeled by a well-defined 
and founded theory. (There is nothing fuzzy about fuzzy theory!) 

2. By doing so, a better approximation of real phenomena by formal models is 
achieved. 

3. A better modeling of real phenomena normally requires more and more 
detailed information—more, in fact, than is needed for rather rough dichoto- 
mous modeling. 

4. The amount of computer readable data is too large to be comprehended by 
a human observer. 


When talking about “applications”, different things can be meant: 


1. One can “apply” one theory to another: for instance, one can apply fuzzy set 
theory to linear programming, which yields another theory, namely, fuzzy 
linear programming. 

2. One can apply one theory to a model, which is an abstract picture of a pos- 
sible real problem situation: the application of fuzzy set theory to inventory 
models, for instance, represents such an application. 

3. One can apply a theory or a model to a real problem and solve it as well as 
possible. 


We have considered applications of the first kind in chapters 9, 10, and partly 
in chapter 12, 13 and 14. This chapter is dedicated to applications of type 2 
and 3, where often the existence of an application of type 2 triggers one of 
type 3. 

The theory of fuzzy sets has already been applied to quite a number of opera- 
tions research problems. As can be expected for a theory of this age, the major- 
ity of these “applications” are applications to “model problems” rather than to 
real-world problems. Exceptions are the areas of classification (structuring), 
control, logistics, and blending. For these areas there is already considerable soft- 
ware commercially available. The same is true for planning languages (decision 
support systems), for instance, in the area of financial planning. The reader should 
realize that the lack of real applications cannot necessarily be blamed on the 
theory. A real application of a certain theory normally requires that the practi- 
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tioner who has the problem to be solved is also familiar with and understands, 
or at least accepts, the theoretical framework of the theory before it can really be 
applied. This obviously takes some more time. 

Real applications, particularly the commercially successful ones, very often 
either are not published or are published after a long delay. This is partially due 
to competitive considerations and partly to the fact that practitioners normally do 
not consider publications as one of their prime concerns. 

Nevertheless, for a textbook and for practitioners applications of type 2 and 3 
are important since, as already mentioned, the knowledge of a type 2 application 
may trigger either other type 2 applications or type 3 applications. 

The number of disciplines in which fuzzy sets are applied is increasing 
steadily. So far the main areas are (in alphabetical order and not in order of impor- 
tance): actuarien science, business administration and management, chemistry, 
earth sciences, ecology and environmental science, economics, engineering (civil, 
industrial, mechanical, nuclear etc.), ergonomy, information technology, medi- 
cine, social sciences, telecommunication, traffic management. 

It would obviously exceed the scope of this text book to cover the majority of 
these areas. Therefore, two areas were selected, which exhibit probably most 
applications: engineering and management. Fuzzy applications in these areas are 
increasingly known by the terms “business intelligence” and “engineering intel- 
ligence”. Table 15—1 shows which applications are described in various chapters 
of this book. 


15.2 Engineering Applications 


Many, if not most engineering applications of fuzzy sets use the principle of fuzzy 
control that was studied in chapter 11. Hence, applications of this type were 
already described in the fuzzy control chapter. A second and large class of appli- 
cations are located in (static) data analysis and hence, were studied in chapter 13. 
There are, however, numerous engineering applications which use other features 
or methods of fuzzy set theory. Examples of those can be found in [Kno and 
Cohen 1998], [Levner et al. 1998], [Jones and Hua 1998], [Gasos and Rosetti 
1999], [Chen et al. 1998]. Some detailed descriptions and surveys can also be 
found in [Zimmermann 1999]. 

Most of these applications require too long a description to be included in this 
textbook. In order to describe the essentials of non-fuzzy control engineering 
applications we have chosen two examples: one showing a linguistic multicrite- 
ria analysis and one which uses dynamic fuzzy pattern recognition as described 
in chapter 13. 
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15.2.1 Linguistic Evaluation and Ranking of Machine Tools 
[Devedzić and Pap 1999] 


The approach suggested by Devedzi¢ and Pap is much broader applicable. It shall 
be illustrated, however, using a central part of these experimental study: 
A metal cutting process generally is preceded by the following planning cycle: 


(a) selection of processing method, 

(b) operations selection and sequencing, 

(c) machine tools selection, 

(d) tooling selection, 

(e) machining parameters selection and determination, 
(f) tool path determination and calculation, 

(g) NC programming, and 

(h) cost and process economy calculations. 


We shall concentrate on the machine tool selection. In particular, we shall focus 
on the machine tools rigidity modeling. 

Rigidity has its clear mechanical definition and can be precisely deter- 
mined for each machine tool element as well as for machine tools as a system 
in a whole. However, this approach is often applied only in the design stage, 
and during the exploitation period this characteristic is qualitatively evaluated 
as “high rigidity”, “medium rigidity”, “low rigidity”, etc. These linguistic values 
are provided by skilled personnel based on experience, intuition and/or recently 
presented evidence. Metal cutting is a highly dynamical process which is influ- 
enced by numerous influences. The ultimate goal of a machining process is to 
produce a workpiece respecting the requested dimensional accuracy and surface 
quality. 

One of the main characteristics of machine tools is their capability to reach 
workpiece requirements. On the other hand, evaluation of the machining process 
can be shown through realized productivity and economy. In reality, machine 
tools characteristics and output features could be indirectly perceived and repre- 
sented through machine tool reigidity, which is often used as an integral qualita- 
tive feature of machine tool condition and capability. For linguistic modeling of 
machine tools rigidity Devedzi¢ and Pap have used literature and empirical infor- 
mation and data, and results of an experiment conducted in laboratory. 

Table 15-2 shows the measured data on which the experiment is based and 
table 15-3 shows the surface quality parameters that can be reached with four 
lathes, B, D, D and E. 

Figure 15-1 depicts the terms of the linguistic variable “rigidity”, where 
“norm(IT)” stands for a normalized surface quality. 
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Table 15-2. Experimental data. 


Machine Turret lathe 
Cutting tool PTGNR 2525 M16 
Workpiece/material Alloyed steel bar (D = 52mm, 


L = 500mm)/AISI 8620 
Cutting speed v (m/min) 80-350 
Cutting feed s (mm/rev) 0.1-0.5 
Cutting depth 6 (mm) 1-3 


Table 15-3. Surface quality parameters (output data). 


R Rna R, R, R R 


a rel Mrel Zrel 
Lathe (um) (um) (um Cut. depth (%) (%) (%) 
B 1-8 6-60 5-19 Ô; 83 68 35 
C 1-8 7-39 4—20 dy 89 65 71 
D 0.6-7 3-37 3-13 Ò; 79 74 52 
E 1-10 9-39 5-25 ð4 67 76 78 
Note: Ro ret = Re min! Reye.c.p.)3 Rm = R max; ði = 1mm, 5, = 2mm, ð; = 2.5 mm, Ô, = 3 mm; R; — 


arithmetical average deviation from center line; Rmax — maximum height of surface roughness 
(microirregularities); R—mean height of surface roughness (microirregularities). 


Priciwiry 











- very high 
- high 
0.5 - medium 


- low 
- very low 


norm(IT) 


Figure 15-1. Linguistic values for variable “rigidity”. 
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The membership functions were checked empirically and proved to be accept- 
able. They are represented by the following triangular or trapezoidal membership 
functions: 


Uvn =[.33, .5, -5, 58] 
Uy =[.42, .58, .58, .75] 
Um =L5, .67, .67, 83] 
uL =[.58, .75, .75, 92] 

Uv, =[.75, .83, 1, 1] 


Total machine tool rigidity depends on partial rigidity of all its elements. Among 
them the greatest influence performs three basic assemblies: main spindle, tail- 
stock center, and toolholder. 

It is clear that the significance of each of these elements is not the same. 
Therefore, it is quite useful to create a procedure for the generation of linguis- 
tic values mentioned above (figure 15-1), based on partial linguistic evalua- 
tion of rigidity value and significance of each element for given machining 
conditions. 

Interviews with experts showed that in the case of partial evaluation of ele- 
ments’ rigidity values usually three linguistic values have been used (figure 15-2). 
Trapezoidal fuzzy numbers representing values of the linguistic variable “element 
rigidity” are defined as 


Uns =[.35, .35,.7, 8] 
ums =I.7, -8, -8, .9] 
Urs = [.8, l, l, 1] 


LELEM_RIG 
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Figure 15-2. Linguistic values for variable “elements’ rigidity”. 


378 FUZZY SET THEORY—AND ITS APPLICATIONS 


Furthermore, using fuzzy sets representing values of variable elements’ rigidity 
and empirical rules defining total machine tools rigidity, fuzzy sets determining 
significance of machine tools elements’ rigidity have been defined (figure 15-3). 
Total number of empirical rules is very large and depend on cardinality of input 
term-sets. 

For the determination of the values of the linguistic variable “significance”, 
however, realistic boundary conditions allow the reduction of the rules to those 
shown in table 15-4: 

The membership functions of the terms of the linguistic variable “signifi- 
cance”, high, medium and low, were defined as: 


Table 15-4. Boundary values of the linguistic variable 
“significance”. 


Element Value Significance Rigidity 
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Figure 15-3. Linguistic values for variable “significance”. 


Uns =[-9,.9, .965, .978] 
Wus = 1.49, 894, 894, .949] 
Urs = [.512, l, l, 1] 


They are shown in figure 15-3: 

The determination of a linguistic value of rigidity is based on the above 
“dictionaries” of linguistic terms for “elements’ rigidity” and “significance”. The 
aggregating of element evaluations to an evaluation for each lathe is performed 
by using a weighted average of the element evaluations, i.e. a type of aggrega- 
tion that was already mentioned in chapter 14 in multi attribute decision making. 
For each lathe j = {B, C, D, E} the evaluation is 


Hj = È Hs He. 


where index i denotes the three elements: 


i = 1 = rigidity of main spindle 
i = 2 = rigidity of tailstock center 
i= 3 = rigidity of tool holder. 


Figure 15-4 shows the results for the four lathes: 

The ranking of the four fuzzy sets characterizing the lathes can now be per- 
formed by using any of the methods mentioned in chapter 14 or, for instance, by 
any of the methods compared by [Bortolan and Degain 1998]. The authors of 
this experiment use the center-of-gravity defuzzification and arrive at the order 
{B, D, E, C}. 


FUZZY SET THEORY—AND ITS APPLICATIONS 


380 


Assuonorp e jo ' 
soma opgan = = = 1 
onpa 1 

ansu posajos  —— 





3 ‘q ‘D ‘g səy Jo senjen uonenjero onsinBurq =“p—-SG} aniy 





Anaorer{ 


ALiaorey{ 





0 
0 
S'O 
I 
aLiaony| 
(1puuou I S'O 0 
0 
Kmuopəp u jo 
songsa opsmbon ---1 
opis Sar] posoan ~ ç'0 
OMSERU payesouss omm į 
q -H 
I 
g oye] 
asiaorey{ 


APPLICATIONS OF FUZZY SETS IN ENGINEERING AND MANAGEMENT 381 
15.2.2 Fault Detection in Gearboxes [Joentgen et al. 1999] 


The problem is that of automatic fault detection in gearboxes (in this case of 
a helicopter gearbox). Used are the methods of fuzzy dynamic data analysis 
described in section 13.3. 

The state-dependent maintenance of machines is a strategy to increase the 
availability of machines and to simultaneously improve the planning of down 
times. One prerequisite is the precise and reliable monitoring of machine’s states. 
Since continuous machine monitoring by a trained expert is very time consum- 
ing and expensive, various systems for automatic diagnosis have been developed. 
They have been successfully applied in different areas, e.g., for diagnosis of 
electric motors [Fogliardi 1997], tape deck chassis [Fochem, Wischnewski, and 
Hofmeier 1997], saw blades [Brandt et al. 1996], roller bearings [Fochem, 
Joentgen, and Geropp 1997], household appliances [Weber, Wischnewski, and 
Fochem 1997], household appliances [Weber, Wischnewski, and Fochem 1997], 
etc. All these diagnostic systems use vibration analysis [Geropp 1995] to detect 
faults in machine components. Vibration analysis is based on the examiantion 
of solid-born signals measured at different places of a machine during opera- 
tion. Changes of a machine’s state lead to changes in the vibration signal. Using 
either expert knowledge or preliminary knowledge, it can be decided whether a 
change in the machine’s state is due to a fault or to changes in some operating 
parameters. 

Many of the above mentioned diagnostic systems use classification methods 
such as fuzzy c-means [Bezdek 1981] and fuzzy Kohonen networks [Tsao, 
Bezdek, and Pal 1994] to recognize different states of a machine. These methods 
require a selection of features which are relevant for the recognition of faults. 
The feature selection is usually based on an expert’s knowledge and is crucial for 
the classification results. 

In this application the functional fuzzy c-means algorithm (FFCM), described 
in 13.3, is used for automatic fault detection in gearboxes based on measured 
vibration signals. This algorithm is suited for the classification of dynamic 
objects, i.e., objects described by trajectories of their features. This paper shows 
how to apply the FFCM algorithm for early recognition of state’s changes as well 
as for feature selection without any requirements for expert knowledge. 

The subject of investigation in this paper is an intact gearbox. It is observed 
over a period of time during operation under a constant load. Vibration signals 
are measured at different positions on the gearbox each minute. 

The task of the analysis is to monitor the state of the gearbox and to recog- 
nize significant changes in its state based on the vibration signal. 

The experiment covered a period of time of approximately 114 hours (6,830 
minutes). To reduce the resulting data set, only every 10" measurement was taken 
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into consideration (i.e., minutes 1, 11, 21 etc.). Carrying out the analysis for 
translated data (e.g., minutes 5, 15, 25 etc.) does not change the final results. 

During preprocessing for each point of time the vibration signal measured 
was converted into a frequency spectrum containing 1024 values using Fourier 
transformation. 

Thus, at each of 682 points of time a frequency spectrum consisting of 1024 
values was given for the analysis. The brightness of the points in the figure rep- 
resents the measured amplitude of a certain frequency at a corresponding point 
of time. To better illustrate the given data set, the upper bound of the scaling was 
set to 1.0, although some amplitude values exceed this upper bound. 

In the course of the experiment a defect has occurred in the gearbox. This 
defect finds expression in the higher frequency amplitudes in the time interval 
between 500 and 640. Due to this defect, the gearbox was turned off at point 630 
and repaired. According to experts, first symptoms of the defect can be recog- 
nized retroactively from point 320 on provided that the whole figure of the data 
set is at expert’s disposal. 

In this application, the objects to classify are states of the gearbox. Each state 
is described by one feature, 1.e., a frequency spectrum. Each frequency spectrum 
at a given point of time is considered as a trajectory. Thus, there are in total 682 
objects or trajectories, which can be clustered using the functional fuzzy c-means. 
The resulting class centers, which are frequency spectra, represent then typical 
states of the gearbox. Depending on the time interval chosen for the analysis, two 
procedures of classifier design can be distinguished: 


e Clustering of all states of the gearbox from the beginning of the experiment 
till the current point (incremental classifier design); 
e Clustering of the n latest states of the gearbox (rolling classifier design). 


The following sections discuss these two types of the design procedure in more 
detail. First some general remarks on the chosen approach are given. 

At point t c, classes, which are typical states, are known. Now a classifier with 
c, + 1 classes is designed. The shapes of the membership functions are then used 
to decide, whether a new class has occurred. Typically, if there is no new class, 
two or more of the membership functions are almost identical. In the other case 
a new class is discovered and has to be labeled. 


Incremental Classifier Design. During incremental design of the classifier, the 
total information about a machine’s states obtained from the beginning of the 
experiment until the current moment is used. In the course of the experiment 
available information is constantly supplemented with new data. For each point 
of time a classifier is designed based on information available so far. 


APPLICATIONS OF FUZZY SETS IN ENGINEERING AND MANAGEMENT 383 


Since the fuzzy c-means, as well as many other clustering methods, has diffi- 
culty recognizing classes with a very different number of objects correctly, new 
states of a machine are recognized only if they have appeared very often or differ 
very much from known states. Therefore, changes of states are discovered rather 
lately. 

In this application 682 objects or trajectories consisting of 1024 values were 
clustered using the functional fuzzy c-means. At the beginning of the experi- 
ment, differences between frequency spectra can hardly be found. These spectra 
constitute the only class “State: intact’. In the following two classes will be 
looked for. 

The first change in the state of the gearbox appeared when the gearbox was 
started anew at point 230 after it was turned off for a while. The changes in the 
frequency spectra (measured with the distance measure for trajectories described 
in [Joentgen, Mikenina, Weber, and Zimmermann 1999]) are so large that these 
new appearing spectra are recognized as a separate class, although their number 
is very small. From this time on two typical states “State: intact” and “State: new 
start” are known. Therefore, in the following three classes will be looked for. 

The third class “State: defective” is recognized approximately from point 440 
on. Degrees of membership of objects to this class exceed all other degrees of 
membership after point 340 (figure 15-5). This means that at point 440 a fault in 
operation is recognized retroactively, starting at point 340. 

Such late fault detection compared to the expert’s statement is due to the above 
mentioned drawback of the fuzzy c-means and can be explained by the large 
number of objects representing class 1 “State: intact’, which were observed in 
the time interval from 0 to 300. It should be noticed that measurements at points 
which are far in the past are not significant for the current fault detection. Con- 
sidering them in the computations deteriorates the classification results. 


Rolling Classifier Design. To avoid the problems related to different sizes of 
classes, the analysis of machine’s states can be carried out periodically using time 
windows each covering 100 points. This means that only the data of the last 100 
points of time are used for classifier design and classification. The size of the time 
windows was chosen arbitrarily. If the windows are too large, older states which 
persist over a long time period prevent early recognition of new states, if they 
are too small, older states may be forgotten and the system tends to recognize 
new states too often. For the sake of simplicity, only each 10" time window is 
considered. 

In the following the time window from point of time t, until point t, is denoted 
by (ti, t). 

At the beginning of the experiment it is supposed that there exists only one 
class “State: intact”. Therefore, the FFCM is used to look for two classes. 
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For time windows before window (140, 240) clustering results in two class 
centers, which can hardly be distinguished. Therefore, they can be considered as 
variations of class “State: intact”. 

Starting the gearbox anew at point 233, which appears at first in time window 
(140, 240), is recognized as a new class and labeled as “State: new start”. Because 
now two typical classes of the gearbox are known, the algorithm will look for 
three classes in the following. 

The new start of the gearbox at point 314, which is at first observed in time 
window (220, 320), leads to such large variations in the frequency spectra that 
the class “State: new start” is split into two classes. Therefore, from this point on 
four classes will be looked for. 

In time window (230, 330) the fault is not recognized yet. Figure 15—6 shows 
the membership functions of the objects to the four calculated classes. Points at 
which the gearbox was started anew (points 233 and 314) can easily be recognized. 
At these points, degrees of membership of objects to class 3 are 1 whereas degrees 
of membership to the other two classes are 0. As stated above, restarting the 
gearbox at point 314 leads to the formation of two classes. Based on the member- 
ship functions shown in figure 15—6 it is not possible to distinguish clearly between 
classes 1 and 2. The centers of classes 1 and 2 are also almost identical. Therefore, 
it is assumed that classes 1 and 2 represent two parts of the class “State: intact”. 

Changes in the membership functions for classes 1 and 2 can first be recog- 
nized in time window (240, 340) (figure 15-7). After starting the gearbox anew 
at point 314 it is possible to distinguish between membership functions of classes 
1 and 2. Considering the shapes of these functions in the whole time window, 
one can notice a decreasing and an increasing trend. The membership function 
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Figure 15-6. Membership functions for time window (230, 330). 
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Figure 15-8. Membership functions for time window (250, 350). 


with a decreasing trend corresponds to class 1 “State: intact” whereas the one 
with an increasing trend corresponds to class 2 “State: defective”. 

These trends are even stronger in the next time window (250, 350), shown in 
figure 15-8. Membership functions characterizing classes 1 and 2 can be distin- 
guished even better than in figure 15-5 and figure 15-6. 

The defect in the gearbox can be recognized from point 340 on. This defect 
is detected 20 points (i.e., 200 min.) later than it should be possible according to 
the expert’s statement. It should be noticed that the diagnosis by an expert can 
also be carried out only retroactively. Thus, the functional fuzzy c-means allows 
an early on-line fault detection in the gearbox. 
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Figure 15-9. Proportional difference between class centers 1 and 2 (with respect 
to the center of class 2) in time window (250, 350). 


Significant differences between the centers of classes 1 and 2 can be found at 
those spectral lines, at which bright vertical lines start or finish. These differences 
with respect to the center of class 2 are illustrated in figure 15-9. 

Peaks in figure 15—9 correspond to characteristic frequencies, which are rele- 
vant for fault detection. 


Refinement of the Analysis. The analysis described in the two previous sec- 
tions was based on just 10% of the available data. In this section it will be inves- 
tigated whether the use of all data can lead to an earlier detection of the change 
from “State: intact” to “State: defective”. During on-line state-monitoring all 
these data would be available and could be taken into account. 

As was shown above, the functional fuzzy c-means can recognize the defec- 
tive state of the gearbox from point 314 on (i.e., from the 3140" minute on). For 
a more precise analysis, now the time interval from minute 2900 to minute 3300 
is considered. To remain consistent with previous calculations, the analysis is 
carried out for time windows each covering 100 points. 

The considered time interval contains 36 minutes (starting at point 3114), 
when the gearbox was turned off and started anew. This action represents an 
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Figure 15-10. Membership functions for time window (3014, 3114). 
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Membership functions for time window (3064, 3200}. 


Figure 15-11. 


description of a gearbox’s state during normal operation. Therefore they can be 
excluded from the analysis. The succeeding points of time are shifted backwards 


external disturbance of the gearbox. The measured data are not relevant for a 
by 36 minutes. In the following, two classes will be looked for. 
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In time window (3014, 3114) (just before starting a gearbox anew) it is not 
yet possible to distinguish two classes. The corresponding membership functions 
are shown in figure 15-10. 

Clear recognition of two classes is possible in time window (3064, 3200) (dis- 
regarding the 36 minutes which start at point 3114). The corresponding mem- 
bership functions are illustrated in figure 15-11. 

As described above, the fault could not be found using time window (230, 
330), i.e., until minute 3300 the fault was not detected. Using all available data 
the fault is detected about two hours earlier (around minute 3200). 

The application presented in this chapter shows that using the functional fuzzy 
c-means automatic fault detection in gearboxes can be successfully carried out 
based on vibration signals. Moreover, combining the rolling classifier design with 
this method allows an early fault detection. When typical states of the gearbox 
are recognized, they must be judged by an expert. Beyond that, the method does 
not require any expert knowledge. 


15.3 Applications in Management 


The borderline between engineering and management applications is fuzzy. Many 
of the functions (such as scheduling, maintenance, layout planning, simultaneous 
engineering, etc.), which are actually management functions, are performed by 
engineers. In universities they are also partly taught in management schools and 
partly in industrial engineering. The underlying mathematical structures differs 
very often. While engineering problems normally are characterized by nonlinear 
functuality and by fewer variables, management problems are generally modeled 
linearly and they are very large in terms of the number of variables and con- 
straints. There are, of course, exceptions to this rule: there are, for instance, more 
problems with a combinatorial character in management than in engineering and 
they are certainly hard to solve. Recent problems that require data mining have 
increased in importance and they seem to be more relevant in the management 
area than in the engineering area. The reason may be, that very often engineer- 
ing problems are more limited in scope while management problems generally 
have to take into consideration the entire enterprise for which masses of data are 
stored in data warehouses. 

In the following we will present fuzzy set applications in main areas of man- 
agement. The examples are selected in such a way, that the most important areas 
as well as the most important methodological approaches are covered. Typical 
managerial problems, such as the determination of creditworthiness, which are 
described in other chapters, are not included again. 
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15.3.1 A Discrete Location Model [Darzentas 1987] 


For quite a number of years, there has been a widespread interest in location 
models. For specific types of these problems, excellent review papers exist. One 
of the most popular models is the “simple plant location model” (SPLP) for 
which, for instance, Krarup and Pruzan [1983] summarize the existing literature 
through the mid-1980Qs. In this paper, the authors also establish some relation- 
ships between SPLP, other location problems, set-covering problems, and integer 
programming. One of the problems, the discrete location problem (DLP), can be 
formulated as a set-covering problem and principally solved by pure zero-one 
programming algorithms. In this type of problem, a number of facilities are to 
be located at specific points within an area, according to precisely quantified 
criteria. This results in a districting, that is, a plan that shows where the faci- 
lities have to be located and what locations they serve. However, in many 
location problems, especially those associated with social policies, noncrisply 
defined criteria are used such as how “near” or “accessible” a facility is or how 
“important” certain issues are, etc. In these cases, a fuzzy sets approach is more 
appropriate. 

In such a problem, the decision maker’s main task is the identification and 
evaluation of criteria on the basis of which an optimum will be obtained. The 
choice of specific locations can only be based on questions like: 


e How “far” should people travel to reach a service point? 

e How “important” are “bad” and “good” roads and public transport? 

e Is “homogeneity” of social class and income within a subset important? 
e Is it “very unfair” to locate two major facilities in one point? 


The fuzzy nature of the problem can be accepted and introduced at various stages 
in the analysis. 

There are two major obstacles to finding “optimal” solutions to DLPs: It is 
necessary but difficult to define all possible covers, that is, subsets of locations, 
which have to enter even the crisp DLP-model. For readers who are not aquainted 
with this type of problem, the above-mentioned paper by Krarup and Pruzan 
or the work of Darzentas [1987, pp. 330-337] are recommended. The second 
problem is the “evaluation” of the covers in order to select the best one. 

The aim of a location project is easy to state: find the “best” districting—which 
means that the objective itself is a fuzzy set. There may also be a number of 
restrictions, such as “the budget allows for approximately M facilities” or “it is 
preferable that village i serves village m,” and vice-versa, or “it is very impor- 
tant that i and j belong to the same district,” and so on. Hence constraints can be 
formulated as fuzzy sets. 
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In acrisp model, the determination of the optimal districting can be performed 
by using integer programming algorithms. If the problem is of reasonable size, 
heuristic versions have to be used. 

In fuzzy DLPs, possibly even with multiple criteria, this approach is not pos- 
sible. One could then use either fuzzy integer programming (see, for example, 
Fabian and Stoica [1984] or Zimmermann and Pollatschek [1984]), or one could 
try to reduce the number of possible districtings to a reasonable size by elimi- 
nating nonfeasible and dominated covers. The remaining covers could be evalu- 
ated with respect to relevant criteria (yielding a fuzzy set for each criterion) and 
then ordered in analogy to methods described in section 14.4. 


Example 15-1 


Consider the road network shown in figure 15-12, which is part of a real road 
network. The points 1,..., 4 represent villages whose populations are given in 
table 15—Sa. The distances between the villages are given in table 15-5b. The 
problem is to optimally locate three facilities in order to serve (cover) each village 
with only one facility. This problem in its nonfuzzy form can be formulated as a 
set-partitioning problem. The fuzzy version of the problem can be formulated as 
a symmetric fuzzy-decision model (see definition 14-1). 

Suppose the three covers shown in figure 15—13 are the only covers feasible 
due to crisp constraints, which are omitted here. In figure 15-13, the villages 
hosting a facility are hatched. For the determination of the “best” cover, the grades 
of membership of all three covers to every fuzzy criterion are rated. These ratings 
and the fuzzy criteria are given in table 15-6. In this example, the degrees of 
membership of the covers in the fuzzy set “decision” are obtained using the min- 
operator. These degrees imply an order on the set of covers. If a crisp decision 


Table 15—5a. Populations. Table 15-5b. Distances between villages. 
Miles 

Village Population 1 2 3 4 

1 1,100 1 — 11 7 9 

2 650 2 11 — — 14 

3 1,350 3 — — — 

4 730 4 9 14 — — 
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Figure 15-12. Road network. 











Figure 15-13. Feasible covers. 
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Table 15-6. Determination of the fuzzy set decision. 


Covers 

C] C2 C2 
It is a better policy to locate this type 
of facility in villages with high 
population: .9 8 7 
The facilities should not be located in 
polluted areas: 6 5 .2 
The distance between a village 
without a facility and a facility should 
not exceed 8 miles considerably: 6 9 6 
Membership values of the decision: 6 5 .2 


has to be made, the cover with the maximum degree of membership (cı, a(c1) 
= .6) is chosen. 


15.3.2 Fuzzy Set Models in Logistics 


OR has been applied extensively to the area of logistics in the past. In the fol- 
lowing, two applications of fuzzy set theory are presented. At first, we show 
the “fuzzification” of a standard problem in OR: the transportation problem. 
Second—as an example of existing projects—we show a decision support system 
based on a fuzzy model. 


15.3.2.1 Fuzzy Approach to the Transportation Problem [Chanas et al. 
1984]. The analysis of “fuzzy counterparts” of linear programming problems 
of some special structure—for example, problems of flows in networks, trans- 
portation problems, and so on—appears to be an interesting task. The following 
model considers a transportation problem with fuzzy supply values of the sup- 
pliers and with fuzzy demand values of the receivers. For the solution of the 
problem, parametric programming is used. 
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Figure 15-14. The trapezoidal form of a fuzzy number 4; = (aj, aj, af, aj’). 
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Figure 15-15. The membership function of the fuzzy goal Ĝ. 


Model 15-1 


m n 
minimize c= »y X cy xy 


i=l j=l 


m 
such that X x =q; i=1,2,...,m 
j=l 


m 


x, 20 i=1,2,...,m;j=1,2,...,n 


~ 


a; and b, denote nonnegative fuzzy numbers of trapezoidal form. Note the slight 
difference between definition 5—3 and the definition shown in figure 15-14, which 
is only used for this section. The value of ,(2,x,)(U;(2.x;)) is interpreted as a 
feasibility degree of the solution with respect to the ith (jth) constraint in model 


APPLICATIONS OF FUZZY SETS IN ENGINEERING AND MANAGEMENT 395 


15-1. With the objective function of model 15-1, a fuzzy number G is associ- 
ated, expressing the “admissible” total transportation costs. The membership 
function, ug, of the G is assumed to be of the form 


Ue (x) =| 


1 for x< Co 
f(x) for x20 


where f(x) is a continuous function, decreasing to zero and achieving the value 1 
for x = Cp (see figure 15-15). In particular, f(x) may be a linear function. u g(x) 
determines the degree of the decision maker’s satisfaction with the achieved level 
of the total transportation costs. 

Model 15-1 now can be reduced to the symmetrical decision model 15-2, 
assuming goal and constraints are aggregated via the min-operator. 


Model 15-2 


maximize A 
such that Ug(c(x)) 2A 


TORTE i=1,2,...,m 
j 
n|) j=1,2,...,n 


A20 Xj 20 


Here, however, this problem shall be solved by parametric programming. For each 
level of a constraint’s fulfillment A, A €e [0, 1], one has to find the cheapest trans- 
portation plan. This plan satisfies the goal G to the maximum degree for the respec- 
tive à. Hence in analogy to definition 14—5 and example 14—7, we shall determine 
max{pg¢(x) A We(x)} 

where u ¢(x) will first be determined by an appropriate linear programming model. 
Here the min-operator is assumed to be acceptable. For the subsequent aggrega- 
tion of (x) and U¢(x), any nondecreasing operator and any decreasing function 
for f(x) can be employed. Let us first turn to the determination of u ¢(x): The para- 
meter of our parametric LP shall be denoted by r, r e [0, 1], and rather than deter- 
mining A-cuts we shall consider (1 — r)-cuts. Using the definition given in figure 
15-14 for the fuzzy numbers specifying supplies and demands, the (1 — r)-cuts 
are intervals of the form: 


` 1 _ 
a = {xu (x) 21- r} = [ai —Tra,, a? + ra? | 


bi” = {xlp; (x) 21-r} = [b} —rb' 


2 1, 2 
0, aj + rb; | 


Our problem can then be modeled as follows: 
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Model 15-3 
maximize S Meyxy 
i=l j=l 


n 
1 — Ld 
such that } Xij ceļa! -ra,,a? + ra | i=1,2,...,m 
j=l 


¥ x; elb; -rb b? + rb} | j=1,2,...,n 
i=] 


x, 20 reļll-r,1] 


where 7 = sup, Uan g(x), that is, the maximum value of ue(x) that can be achieved 
for a given r. Solving this model either as a parametric LP or with special algo- 
rithms for parametric transportation models, we obtain u(r) for re [1 -— r, 1]. 
This can now be combined with e(r) to define the bership function of the 
fuzzy set “decision.” 


Example 15-2 [Chanas et al. 1984] 


There are two suppliers with supply values: 
a, = (10, 5, 10, 5) and a, = (16, 5,16,5) (triangular fuzzy numbers) 
and three receivers with demand values: 
b, = (10,5, 10,5), b, =(9,4,9, 4); b =(1,1,1,1) 
(also triangular fuzzy numbers), respectively. The unit transport costs are 


Cy = 10 Cp = 20 C13 = 30 
C2; = 20 Ca = 50 C23 = 60 


The membership function of the fuzzy goal is linear: 


0 for x2800 
Ue (x) = J1 for x <300 
800 — x 





500 for x €[300, 800] 


Model 15-3 for this example becomes: 
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minimize c= 10x + 20x19 + 30x13 + 20x2; + 50x22 + 60x23 


such that xı +x + x3 2 10- 5r 
Xua txr +x <10+5r 
X21 + Xn + Xz 2 16 — 5r 
Xo, + Xv + X3 S< 16 + 5r 
Xu +X), 2 10- 5r 
Xu +x 510+ 5r 
X2 + Xn 2 9— 4r 
X2 + Xn S9 + 4r 
X3 tx 2 l-r 
X3 txz Sl+r 
x, 20 Vi, j 
Table 15-7 shows the parametric transportation problem table. Column FR 
denotes a “fictitious” receiver, row FD a “fictitious” supplier, and M a large real 
number. The rows and columns without an asterisk correspond to the suppliers 
having supply values settled at the minimum level. In this section the FD and FR 
are blocked by assigning a large transport cost M to their cells. The rows and 
columns with an asterisk correspond to the maximum surplus of the product that 
may be sent additionally (but not necessarily, and therefore the respective trans- 
port costs to the “fictious” receiver and suppliers are equal to zero) if the con- 
straints are to be satisfied at least to the degree 1 — r. 
It should be observed that the joint supply value of all the suppliers is equal 
to ã = (26, 10, 26, 10) and the joint demand value of all the receivers is equal to 


b= (20, 10, 20, 10). The maximum degree to which the constraints could be sat- 
isfied is equal to 7 = .7. Therefore the relevant interval for analysis is r e [.3, 1]. 


Table 15-7. Table of the parametric transportation problem. 


Receivers 
Suppliers 1 2 3 1* 2* 3* FR Supply 
1 10 20 30 10 20 30 M 10 — 5r 
2 20 50 60 20 50 60 M 16- 5r 
1* 10 20 30 10 20 30 0 10r 
2* 20 50 60 20 50 60 0 10r 
FD M M M 0 0 0 0 20r 


Demand 10 —5r 9— 4r l-r 10r Sr 2r 6 + 20r 
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Table 15-8. Solution to transportation problem. 


3<srsh, hers .6 6srsl 
Xp 34+ 14r 9-A4r 9— 4r 
X13 7 — 19r l-r l-r 
Xz 10+ 5r 10 + 5r 16 — 5r 
Xn 6 — 10r 6— 10r 


The solution of this example is shown in table 15-8. The membership func- 
tion g(r) takes the form 


1 
06 +1.38r for rel 3,5 


1 
ug(r)=|.18+1.02r for re i, 6| 
.54 +0.42r for rel[.6,1] 


The maximizing solution is obtained for r = .4059 and u &(.4059) = .5941. Figure 
15—16 depicts this situation in analogy to figure 14-5. 


15.3.2.2 Fuzzy Linear Programming in Logistics. Ernst [1982] suggests a 
fuzzy model for the determination of time schedules for containerships, which 
can be solved by branch and bound, and a model for the scheduling of contain- 
ers on containerships, which results eventually in an LP. We shall only consider 
the last model (a real project). 

The model contained in a realistic setting approximately 2,000 constraints and 
originally 21,000 variables, which could then be reduced to approximately 500 
variables. Thus it could be handled adequately on a modern computer. It is 
obvious, however, that a description of this model in a textbook would not be 
possible. We shall, therefore, sketch the contents of the modeling verbally and 
then concentrate on the aspects that included fuzziness. 

The system is the core of a decision support system for the purpose of sched- 
uling properly the inventory, movement, and availability of containers, especially 
empty containers, in and between 15 harbors. The containers were shipped 
according to known time schedules on approximately 10 big containerships 
worldwide on 40 routes. The demand for container space in those harbors was to 
a high extent stochastic. Thus the demand for empty containers in different 
harbors could either be satisfied by large inventories of empty containers in all 
harbors, causing high inventory costs, or they could be shipped from their loca- 
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Figure 15-16. The solution of the numerical example. 


tions to the locations where they were needed, causing high shipping costs and 
time delays. 

Thus the system tries to control optimally primarily the movements and inven- 
tories of empty containers, given the demand in ports, the available number of 
containers, the capacities of the ships, and the predetermined time schedule of 
the ships. 

This problem was formulated as a large LP model. The objective function 
maximized profit (from shipping full containers) minus cost of moving empty 
containers minus inventory costs of empty containers. When comparing data of 
past period with the model, it turned out that very often ships had transported 
more containers than their specific maximum capacity. This, after further inves- 
tigations, led to a fuzzification of the ship’s capacity constraints, which will be 
described in the next model. 
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Model 15—4 [Ernst 1982, p. 90] 


Let 
z=c'x the net profit to be maximized 
Bx <b _ the set of crisp constraints 
Ax Sd__ the set of capacity constraints for which a crisp formulation turned 
out to be inappropriate 
Then the problem to be solved is 
maximize z=c'x 
such that Ax Sd 
Bx <b 
x20 (15.1) 
This corresponds to model (14.14). Rather than using model (14.19) to arrive 
at a crisp equivalent LP model, the following approach was used: Based on equa- 


tion (14.10) and model (14.11), the following membership functions were defined 
for those constraints that were fuzzy: 


t; . 
H(t) = ~~ O<t,Sp,-—d,iel, 
I = Index set of fuzzy constraints. 
As the equivalent crisp model to (14.1), the following LP was used: 
maximize z’=c'’x- Ys (pi — biu; (t;) 
iel 
such that Ax<d+t 
Bx < b 
t<p-b 
x,t20 (15.2) 


where the s; are problem-dependent scaling factors with penalty character. 

Formulation (15.2) only makes sense if problem-dependent penalty terms s;, 
which also have the required scaling property, can be found and justified. 

In this case the following definitions performed successfully: First the crisp 
constraints Bx < b were replaced by Bx < .9b, providing a 10% leeway of capac- 
ity, which was desirable for reasons of safety. Then “tolerance” variables t were 
introduced: 


Bx — t < 9b 
t<.1b 


The objection function became 
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maximize z = c’x — s't 
where s was defined to be 


_ average profit of shipping a full container 


average number of time periods that elapsed 
between departure and arrival of a container 


Because of this definition, more than 90% of the capacity of the ships was used 
only if and when very profitable full containers were available for shipping at the 
ports, a policy that seemed to be very desirable to the decision makers. 

Before turning to another application area, it should be mentioned that other 
applications of fuzzy set theory can be found in the literature [Oh Eigeartaigh 
1982] and that the development of model (14—9) was initially triggered by a real 
problem in logistics described by Zimmermann [1976]. 


15.3.3 Fuzzy Sets in Scheduling 


Scheduling is a very common activity in management. It concerns very different 
areas, i.e. production, maintenance, transportation, activities, etc. The environ- 
ments in these different areas differ from each other and so do the specific con- 
straints that have to be taken into consideration. Some of these scheduling tasks 
have a more engineering character (such as in telecommunication, in repair, in 
computer networks, etc.). They are not considered here. We rather focus on those 
areas that are generally in the general management domain. Sometimes planning 
and scheduling (or control) are closely related to each other and can hardly be 
separated. Also, some neighboring areas, such as production- and inventory 
control are very much interrelated. We shall present six cases which cover dif- 
ferent areas and also different approaches for planning and scheduling. 


15.3.3.1 Job-Shop Scheduling with Expert Systems [Bensana et al. 1988]. 
In the following, we will present a job shop scheduling approach where concepts 
from the field of artificial intelligence and concepts of fuzzy set theory enrich 
traditional OR. 

Different kinds of knowledge cooperate in the determination of feasible sched- 
ules. One kind of knowledge is represented by rules. Relevances of rules with 
respect to facts and goals are expressed by concepts of fuzzy set theory. First, we 
will sketch the system. Second, we will focus on the application of fuzzy set 
theory within the system. 

The scheduling problem in a workshop can be stated as follows: Given a set 
of machines and technological constraints, and given production requirements 
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expressed in terms of quantities, product quality, and time constraints expressed 
by means of earliest starting times and due dates for jobs, find a feasible sequence 
of processing operations. 

A set of K jobs must be performed by a set of M machines. Each job k is char- 
acterized by a set of operations O, assigned to machines on which they have to 
be performed. A schedule is described by means of a precedence graph, expressed 
by a set of pairs (O;, O;) denoting that O; must precede O}. 

The system, implemented in LISP and named OPAL, consists of two planning 
modules—the “constraint-based” analysis module and the “decision-support” 
module—whose interaction is guided by a “supervisor” module. The supervisor 
module plays the role of the inference engine and guides the search process. The 
structure of the system is shown in figure 15-17. 

The constraint-based analysis (CBA) module deals with a partial order of 
operations derived from the processing sequence of parts and the schedule in 
progress on one side and the time constraints for job processing on the other. By 
subsequent systematic comparisons of the existing precedence constraints, new 
precedence constraints are generated. This procedure stops in one of the follow- 
ing states: 


success: A feasible and complete schedule is derived. 
failure: Due to conflicting precedence constraints, a feasible schedule does not 
exist. 











Constraint 
based analysis 
module 





Decision 


support 
module 


Figure 15-17. Structure of OPAL. 
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wait: The schedule in progress is incomplete (i.e., the set of precedence 
constraints does not form a complete order), and no more precedence 
constraints can be generated. 


If the CBA module reaches a “wait” state, the decision pertaining to operation 
ranking is no longer dictated by feasibility considerations with respect to due 
dates. Such a decision can be made according to other kinds of criteria of a tech- 
nological nature (e.g., it is better not to cut a workpiece made of metal M before 
a workpiece made of metal M’), or related to productivity (facilitate material flow, 
avoid filling up machine input buffers, avoid long set-up times . . .). 

According to these criteria, a decision-support module (DS) generates new 
precedence constraints. First, it selects a subset C of the set of all unordered pairs 
of operations. Second, it choses one element of C and forms a new precedence 
constraint. 

The selection can be based on criteria like specific machines, specific opera- 
tions, temporal location, influence on the quality of the schedule, or influence on 
the resolution speed. The grades of membership of the unfixed pairs of opera- 
tions in the sets defined by those criteria may be expressed fuzzily. If more than 
one criterion is used for selection, the corresponding fuzzy sets are intersected 
by the minimum-operator. 

In the second step, one element of this fuzzy set is chosen to be fixed, that is, 
to be the new precedence constraint. This step is carried out by using a collec- 
tion of pieces of advice expressed as “if... then” rules. Rules differ by their 
origin and by their range of application (general or application-dedicated). More- 
over, their efficiency is more or less well known and depends upon the prescribed 
goal, or the state of completion of the schedule. These rules can express antago- 
nistic points of view. Lastly, they are usually pervaded by imprecision and fuzzi- 
ness, because their relevance in a given situation cannot be determined in an 
all-or-nothing manner. 

To take these features into account, each rule r is assigned a grade of rele- 
vance 1,(k) with respect to goal k. 1,(k) can be viewed as the grade of mem- 
bership of rule r to the fuzzy set of relevant rules for goal k. The aim of these 
coefficients is basically to create an order on the set of rules. For every pair 
of operations, the “if” part of a rule is evaluated as to the extent to which 
O; should precede O; according to the attribute of the rule. Let v be the index 
qualifying this attribute, and let v; be the value of this index when O; pre- 
cedes O, The ratio x; = ; L is then evaluated. To avoid thresholding 

y Ji 





effects, three fuzzy sets H = high ratio, M = medium ratio, and S = small ratio 
are defined (see figure 15—18). Hence the relation appearing in the rule is a fuzzy 
relation. 
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Figure 15-18. Fuzzy sets for the ratio in the “if” part of the rules. 


The “then” part of all rules is of the same format. It provides advice about 
whether O; should precede O; (i < j) or if the rule does not know (i ~ j). This 
advice is expressed by three numbers: 


U,(i< j) = min(us(x;), T, (k)) 
U,(j ~ i) = min(u y (xy), 1,(k)) 
u, (j <i) = mina (xy), 1,(k)) 


The rules relevant for goal k are all triggered and applied to all facts in the set 
C. The proportions of relevant triggered rules preferring i < j, j < i, i ~ j are 
obtained as relative cardinalities (see definition 2-50): 


p< f= $ p, (i< j)/X n, (k) 
p(j<i= du <d/> nk) 
pj ~) =F u, i~ p/d n, (k) 


When p(i ~ j) is close to 1, it is not possible to decide which of the two opera- 
tions should precede the other because the rules are indifferent. In contrast, when 
p(i ~ j) is close to 0, but p(i < j) is close to pQ < i), the set of rules is strongly 
conflicting. The preference index for decision i < j is defined as min {p(i < j), 1 
— p(i ~ j)}; in terms of fuzzy logic, it expressed to what extent most of the trig- 
gered rules prescribe i < j, and most are not indifferent about O; preceding O}. 

The schedule is gradually built up by adding precedence constraints between 
operations. The search graph is developed as follows: each time the CBA module 
stops, a new node is generated and the current schedule is stored. The DS module 
then generates a new precedence constraint to the schedule graph, and the CBA 
module checks for consequent precedence constraints. Backtracking occurs if the 
explored path leads to a failure state. When no feasible schedule at all exists, the 
data must be modified in order to recover feasibility. 
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15.3.3.2 A Method to Control Flexible Manufacturing Systems [Hintz and 
Zimmermann 1989]. The following application shows the usage of multiple 
concepts of fuzzy set theory within a hybrid system for production planning and 
control (PPC) in flexible manufacturing systems (FMSs). FMSs are integrated 
manufacturing systems consisting of highly automated work stations linked by a 
computerized material-handling system making it possible for jobs to follow 
diverse routes through the system (see figure 15—19). They facilitate small batch 
sizes, high quality standards, and efficiency of the production process at the same 
time. 

Decentralized PPC systems for each FMS are provided with schedules of 
complete orders by an aggregate planning system. They are responsible for 
meeting the due dates, minimizing flow times, and maximizing machine utiliza- 
tions. Generally, these objectives are conflicting. The planning process is carried 
out by subsequently solving the subproblems: 


Master scheduling 
Tool loading 
Releasing scheduling 
Machine scheduling 


AUNDI 


Subproblem 1 is solved by using fuzzy linear programming (FLP), subproblem 
2 is solved by a heuristic algorithm, and subproblems 3 and 4 are solved using 


AUTOMATED WAREHOUSE 


work 
discharge 


Warehousing! conveyor 


delivery gate 4 


Robot -trailer 


ROBO-TRAILER SYSTEM 
Auto Pallet Changer (APC) 





Figure 15-19. Example of an FMS [Hartley 1984, p. 194]. 
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approximate reasoning (AR). We will just sketch the master scheduling, omit the 
tool loading, and focus on the release and machine scheduling. 


Master Scheduling. The objective of the master schedule is to determine a 
short-term production program with a well-balanced machine utilization that opti- 
mally meets all due dates. Its determination is a quite well-structured problem, 
although some important input data are rather uncertain. Since nearly the com- 
plete manufacturing of a part can be performed within an FMS, a simultaneous 
approach using FLP (as defined in section 14.2.1) has been employed for the 
master scheduling. Restrictions to be considered in the master schedule are as 
follows: 


1. Parts can only be processed when they are released from earlier production 
stages. 

2. They have to meet given due dates in order to match the following opera- 
tions and assembling. 

3. The capacity of the FMS must not be exceeded. Because the machines may 
partially be substituted by each other, they have to be classified into appro- 
priate groups. 

4. There is only a limited number of (expensive) fixtures and pallets available. 


In restrictions 1 and 2, release and due dates are often rough estimates that 
include safety buffers and unnecessary work-in-process inventories. In practice 
it is often possible to supply some parts earlier than initially planned (i.e., by 
Overtime) or to violate the due dates only for a portion of an order (for instance, 
by lot-size splitting) without seriously disturbing processing or assembling. On 
the other hand, if release dates or due dates are chosen too stringently, there may 
be no feasible solution at all. 

For these reasons, restrictions 1 and 2 are modeled as fuzzy constraints while 
restrictions 3 and 4 are modeled as crisp constraints. The solution of the FLP 
yields a solution 


e that is feasible according to restrictions 1—4, if possible, or 

e that minimizes the deviations from given due dates and distributes them uni- 
formly among the different orders. The value of the maximized variable then 
denotes the degree of membership of the optimal solution in the set of feasi- 
ble and optimal solutions. 


Release and Machine Scheduling. Decisions concerning the parts schedule for 
both releasing and machining are arrived at by AR. This is considered to be an 
appropriate way to model a very complex situation with many interdependencies. 
The decision criteria are formulated in terms of production rules, which have been 
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shown to lead to quite stable decisions. It will be shown later that this approach 
also leads at least to a very good compromise of the tree mentioned conflicting 
goals of scheduling. In addition, this method is very suitable for interactive deci- 
sion making, where the decision maker can employ familiar linguistic descrip- 
tions of the situations. 

The basic release scheduling procedure can be regarded as dispatching parts fora 
single capacity unit (the FMS) with several work stations: As long as unused 
working places and appropriate pallets with fixtures are available, new parts can be 
released into the FMS. Once the upper limit of parts has been reached, the remaining 
parts have to wait in a queue until one of the parts leaves the FMS. Then the decision 
of which part should be released next will be made using an AR procedure. 

The machine scheduling procedure is very similar to dispatching when using 
priority rules. This means that no machine is allowed to wait if there is a part that 
can be processed on that machine. If there are several parts at a time waiting for a 
machine, then another AR procedure is used to choose a part from the waiting line. 

For both AR procedures, a hierarchy of decision criteria is defined (see figure 
15-20). This hierarchy corresponds to stepwise operationalizing the decision cri- 
teria until they can easily be used by the decision maker. On the other hand, such 
a hierarchy can be considered as the combination of elementary local-priority 
rules in a more comprehensive global-priority or decision rule. The single ele- 
ments or concepts of the hierarchy may in general consist of arithmetic or lin- 
guistic terms. Both the hierarchy and the ways to make the concepts operational 
are heuristic in nature. Hence no optimal solution can be guaranteed. 

Let us further concentrate on the criteria hierarchy depicted in figure 15—20a 
for the release scheduling. The decision of which part to release next mainly 
depends on date criteria of the parts under consideration or the impact of parts 
on machine utilization, or it may depend on some kind of external priority. For 


Process Part j 
next on Machine i 
Criteria Criteria 
Slack Operating | [Alternative 
Time Time Machines 














xterna spe as 
Utilization 
Uniformity of{|Util. without 
Utilization Personnel 









Criteria 
Time Time 





(a) Release scheduling (b) Machine scheduling 


Figure 15-20. Criteria hierarchies. 
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the date criteria, we furthermore distinguish between the slack time of a part and 
the time the part has already waited for processing. 

The impact on the effect on machine utilization can be twofold. First, we have 
to take care that the machines are used as uniformly as possible, thus trying to 
avoid bottlenecks. For this purpose, we define a criterion “uniformity of utiliza- 
tion.” On the other hand, we want to ensure a good utilization in the shift with 
reduced personnel, during which no parts can be fixed on pallets. On the con- 
trary, parts can only be processed as long as they do not need any manual 
operation, be it for changing a pallet or in any case of failure. We shall take this 
into consideration by using the concept “processing time until the next fixturing.” 
The external priority can be given by the plant manager or some other person 
responsible. 

To illustrate the AR process, we will look at the definitions of the concepts of 
the hierarchy and the aggregation of concepts by the rule set. We will focus on 
the derivation of the date criterion of the slack time and the waiting time crite- 
ria. Slack time and waiting time are considered linguistic variables as defined in 
section 9.1: 


Linguistic variable Term set 

slack time critically_short, short 
waiting time short, medium, long 
date criterion urgent, not_urgent 


The base variable is defined for all possible values for the indicator, that is, in 
general, all real numbers within a reasonable interval. The meaning of the terms 
can be defined by giving the degree of membership as a function of the above- 
defined indicator as base variable. As membership functions, piecewise linear 
functions are used. The parameters were obtained by extensive simulation studies 
for a specific structure of orders to be processed in a specific FMS. 

An essential task before aggregating these two criteria with the date criteria is 
the assignment of degrees of sensibleness to each element (rule) of the Cartesian 
product defined by the assumptions and the conclusion: {long, medium, short} 
& {critically_short, short} ® {urgent, not_urgent}. This can be done by an 
expert (scheduler) and results in the “degrees of sensibleness” shown in paren- 
theses for each rule above. 

An example rule set might be (degree of sensibleness given in parentheses): 


1. IF waiting time is long AND slack time is critically_short THEN date 
criterion is urgent (1.0) 
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2. 


3. 


IF waiting time is medium AND slack time is critically_short THEN date 
criterion is urgent (0.8) 

IF waiting time is short AND slack time is critically_short THEN date 
criterion is urgent (0.6) 

IF waiting time is long AND slack time is short THEN date criterion is 
urgent (0.5) 

IF waiting time is medium AND slack time is short THEN date criterion is 
urgent (0.2) 

IF waiting time is medium AND slack time is short THEN date criterion is 
not_urgent (0.7) 


Each of these rules can now be interpreted as one possible aggregation of the 
two criteria “slack time” and “waiting time” with the “date criteria” (see figure 
15—20a). Only rules with a nonzero degree of sensibleness are considered. The 
AR procedure applied is depicted in figure 15—21. That is, first the conditional 
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Figure 15-21. Principle of approximate reasoning. 
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parts of the rules connected by “AND” or “OR” are aggregated by using the y 
operator. The “THEN” of the rule is then interpreted as “the conditions hold and 
the rule is valid,” where this “AND” is also modeled by the y operator. In this 
case, however, Y is taken to be zero, since no compensation is assumed between 
the truth of the rule and the validity of its conditions. If more than one rule leads 
to a certain condition, the maximum of the respective degrees of membership 
determines the final result. 


Example 15-3 


We want to compute the values (degrees of membership) of the terms of the “date 
criteria” in figure 15—20a. Consider three parts, whose slack time and waiting 
time are linguistic variables as described above. The grades of membership in 
terms of the linguistic variables are given in table 15-9. 

In the first step, the conditional parts of the rules are aggregated by using the 
y operator. In this example, y= .5 is used. The results are depicted in table 15-10. 
In the second step, the rules are evaluated. The use of the y operator with y = 0 
is equivalent to the multiplication of the degree of membership of the condition 
and the degree of sensibleness. The results are summarized in table 15-11, where 
the maxima of the respective degree of membership for the two terms (urgent, 
not_urgent) of the linguistic variable “date criteria” are printed in bold. Part 3 in 
the table shows the highest degree of membership in the fuzzy set of parts with 
urgent date criteria and the lowest degree of membership in the fuzzy set of parts 
with not_urgent date criteria. 


Results. The approach described above has been programmed, and its perfor- 
mance has been compared to systems with no master scheduling and employing 


Table 15-9. Membership grades for slack time and 


waiting time. 
Membership grade 
of part 
l 2 3 
Waiting time: long 0.7 0 0.7 
medium 0.2 0.8 0.3 
short 0 0.4 0 
Slack time: _ critically_short 0.4 0.8 0.7 


short 0.6 0.2 0.3 
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Table 15-10. Membership grades for conditional 
parts of the rules. 


Part 
1 2 3 
Condition 1 0.58 0.00 0.67 
Condition 2 0.20 0.78 0.41 
Condition 3 0.00 0.53 0.00 
Condition 4 0.72 0.00 0.41 
Condition 5 0.29 0.37 0.21 


Condition 6 0.29 0.37 0.21 


Table 15-11. Membership grades for the rules. 


Part 
1 2 3 

Date criterion is urgent: 

conclusion 1 0.58 0.00 0.67 

conclusion 2 0.16 0.62 0.33 

conclusion 3 0.00 0.32 0.00 

conclusion 4 0.36 0.00 0.21 

conclusion 5 0.06 0.07 0.04 
Date criterion is not urgent: 


conclusion 6 0.20 0.26 0.15 


only simple priority rules for release and machine scheduling using a general sim- 
ulation program for FMS. The results are shown in table 15—12. The suggested 
approach dominated the classical priority scheduling with respect to all three 
objectives. 


15.3.3.3 Aggregate Production and Inventory Planning [Rinks 1982a, b]. 
The “HMMS-model” [Holt et al. 1960] is one of the best-known classical models 
in aggregate production planning. It assumes that the main objective of the pro- 
duction planner is to minimize total cost, which is assumed to consist of costs of 
regular payroll, overtime and layoffs, inventory, stock-outs, and machine setup. 
The model assumes quadratic cost functions and then derives linear decision rules 
for the production level and the work-force level. The following terminology 
is used: 
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Table 15-12. Results. 


Suggested Priority rule 
Criteria approach approach 
Mean in-process waiting time [min] 2,884 3,369 
Part of lots that have met their due dates [%] 97 28 
Mean machine utilization [%] 80 79 


FS, = sales forecast for period t 
W, = work force level in period t — 1 
Iı = inventory level at the end of period t — 1 
AW, = change in work force level in period t 
P, = production level in period t 


In general, the decision variables are related to the cue variables as 


P, = f(FS,, W1, lL) 
AW = g(FS,, I) 


By contrast to most other models, the HMMS-model was tested empirically for 
a paint factory. The cost coefficients were derived in different ways (statistically, 
heuristically, etc.), and the performance of the decision rules was compared to 
the actual performance of the paint factory managers [Holt et al. 1960]. 

The following model resulted for the paint factory. 


Model 15-5 
minimize Cy = minimize)’ C, 
t=] 
where 
C, = [340W, | Regular payroll costs 

+ [64.3(W, — W )’] Hiring and layoff costs 

+[0.20(P — 5.67W,) +51.2P —281W,] Overtime costs 

+ [0.0825(/, — 320) | Inventory-connected costs 


and subject to restraints 


li +P,- S,=I, t=1,2,...,N 
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Even though the HMMS-model performed quite well and is used as a common 
benchmark for later models, it was rarely used in practice. The main objection 
was generally that managers would not use it, roughly speaking, because too 
much mathematics was involved. 

Rinks tries to avoid this lack of acceptance by suggesting a model based on 
the concepts described in chapters 9 and 10 of this book. He developed one 
production and one work-force algorithm that consist of a series of relational 
assignment statements (rules) of the form 


If FS,is...and Z; is... 
and W, is...then P,is... 


Else... 
and 
If FS,is...and Z3 is... 
and W, is...then AW, is... 
Else... 
respectively 


He uses the definition (given in table 15—13) of the terms of linguistic vari- 
ables. Figure 15-22 sketches the membership functions of the terms of the lin- 
guistic variables used. Forty decision rules were suggested (see table 15-14), 
these were not claimed to be optimal but rather heuristic in character and accept- 
able to the manager. 
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Figure 15-22. Membership functions for several linguistic terms. 
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Table 15-13. Definition of linguistic variables [Rinks 1982]. 


Base Membership 
Linguistic terms Acronym variable" function expression” 
VERY HIGH VH x HIGH x * HIGH x 
(POSITIVE, VERY BIG) (PVB) (dx) 
HIGH H x 1 — ef Sita”) 
(POSITIVE BIG) (PB) (dx) 
RATHER HIGH RH x 1 — e0250] 
(POSITIVE, RATHER BIG) (PRB) (dx) 
SORTOF HIGH SH x 1 — ef 025/0-4-x1"%) 
(POSITIVE, SORTOF BIG) (PSB) (dx) 
AVERAGE A x 1 — eH 
(ZERO) (Z) (dx) 
SORTOF LOW SL x 1 — el02504] 
(NEGATIVE, SORTOF BIG) (NSB) (dx) 
RATHER LOW RL x 1 — el(0.251-0.7-)"*) 
(NEGATIVE, RATHER BIG) (NRB) (dx) 
LOW L x 1 — e050] 
(NEGATIVE BIG) (NB) (dx) 
VERY LOW VL x LOW x * LOW x 
(NEGATIVE, VERY BIG) (NVB) (dx) 
AT LEAST AVERAGE ALA x 1- e -l1<x<0 

1 O0<xx 1 

AT MOST AVERAGE AMA x l -l<x<0 


l-e! —O0<x<1 


* x is any one of the following variables: W,_,, P'S, W,, and P, dx is AW, 
> All variables are scaled to be placed in the [—1, 1] interval. 
° dx replaces x in the membership function expression for use with AW.. 


To test the performance of the suggested approach, the data of the paint factory 
of the HMMS-model were used. In order to apply Rinks decision rules, the mem- 
bership functions of the terms, as shown in figure 15-22, had to be calibrated. In 
fact the range [—1, 1] on the horizontal axis of this figure had to be calibrated to 
the data. For test purposes, lower and upper bounds as shown in the following 
tabulation were derived from available historical data (HMMS): 
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Table 15-14. Membership functions. 


Cue variables Decision variables 
Rule no. FS, L Wi P, AW, 
1 H AMA H H Z 
2 H AMA A RH PRB 
3 H AMA L SH PVB 
4 SH L H H Z 
5 SH L A RH PRB 
6 SH L L SH PVB 
7 SH SH H SH NRB 
8 SH SH A A Z 
9 SH SH L A PRB 
10 A A H SH NRB 
11 A A A A Z 
12 A A L A PRB 
13 SL SL H SH NRB 
14 SL SL A A Z 
15 SL SL L SL PRB 
16 RL L H SH NRB 
17 RL L A A Z 
18 RL L L A PRB 
19 L ALA H SL NVB 
20 L ALA A RL NRB 
21 L ALA L L Z 
22 SL H H SL NVB 
23 SL H A RL NRB 
24 SL H L RL Z 
25 H AMA SH H PSB 
26 H AMA SL SH PB 
27 SH L SH H PSB 
28 SH L SL SH PB 
29 SH SH SH A Z 
30 SH SH SL A PSB 
31 A A SH A NSB 
32 A A SL A PSB 
33 SL SL SH A NSB 
34 SL SL SL A Z 
35 RL L SH A NSB 
36 RL L SL A Z 
37 L ALA SH RL NB 
38 L ALA SL L NSB 
39 SL H SH RL NB 
40 SL H SL RL Z 
1. Acronyms for the values of the linguistic variables are defined in table 15-13. 
2. Each production rule is a fuzzy relational assignment statement of the form “IF FS, is AND 
Lais  =ANDW,,is__ THENP,is__ č ” 
3. Each work force rule is a fuzzy relational assignment statement of the form “IF FS, is AND 


L is AND W, is THEN AW, is 2 
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Variable Lower bound Upper bound 
W, 60 115 
AW, —10 10 
P, 250 750 
| 150 490 
FS, 250 750 


In the absence of historical data, the manager would use his or her judgment 
to make the determinations. For computations, the max-min compositions were 
used, resulting in fuzzy sets as representing the “conclusion” or “decision.” Since, 
however, a decision concerning the workforce, production, or inventory of next 
period should be a crisp decision, Rinks used the maximum rule if possible. If 
the membership function did not have a unique maximum he used other, heuris- 
tic rules to choose the crisp decision to be implemented. 

For the 60 months of data for the HMMS-model (1949-1953), the results of 
the work-force algorithm are shown in figure 15-23. The cost results are shown 
in table 15-15. 

Rink’s own evaluation of the simulation results reads as follows: 


While the 5.0 per cent cost penalty evidenced by the production scheduling fuzzy algo- 
rithms is somewhat greater than that reported by other heuristics—Search Decision 


Work Farce 
(Men) 
120 + 


no 4 





mm Linear Decision Rules 





oma Fuzzy Algorithm Model 


1 2 3 $ 5 Year 


Figure 15-23. Comparison of work force algorithms. 
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Table 15-15. Cost results. 
Costs Linear DR Fuzzy 


(1,000$) HMMS (optimal) algorithm 
Regular payroll 1,879 1,814 
Hiring and layoff 20 22 
Overtime 129 251 
Inventory 25 43 


Total cost 2,053 2,130 


Table 15-16. Comparison of performances. 


Ice cream Chocolate Candy Paint 
Decision rule 100% 100% 100% 100% 
(perfect) 
Decision rule 104.9% 102.0% 103.3% 110% 
(moving average) 
Company 105.3% 105.3% 114.4% 139.5% 
performance 
Management 102.3% 100.0% 124.1% 124.7% 
coefficients 
Correlation W” = .78 W” = 57 W” = .73 W” = .40 
PY = 97 PY = .93 P” = .86 P = .66 


Rule [Taubert 1967] and Parametric Production Planning [Jones 1967] reported cost 
penalties of less than one percent for the paint factory—it must be ramembered that 
the fuzzy algorithms do not even require an explicit cost function. For situations where 
restrictive assumptions cannot be retionalized and sufficient data is not available to 
construct a cost function, approximate reasoning based models would seem to offer an 
appealing alternative [Rinks 1982b, p. 579]. 


If Rinks had compared his results to other benchmarks, he would probably have 
been more optimistic. Table 15—16 is from Bowman [1963, p. 104] and shows 
the real performance and the performance of another heuristic, the management 
coefficient approach, in the case of the HMMS paint factory and three other 
plants. Compared to the 139% and 124.7% performance of these two approaches, 
the 105% performance of the fuzzy algorithm would look even better. 
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15.3.3.4 Fuzzy Mathematical Programming for Maintenance Scheduling. 
The following application, basing on a master thesis from Zittau, Germany, is of 
interest because the effects of different operators were investigated and because 
parametrized membership functions were used. 


Model 15-6 [Holtz and Desonki 1981] 


The problem objective here is to determine optimal maintenance cycles in elec- 
trical power plants. Stochastic models had been used before, but because of the 
very low frequency of breakdowns, it seemed that a model based on frequentis- 
tic arguments was not appropriate. 


T; : Cycle times of maintenance operations for j = 1, . . . , N maintenance crews 
(decision variable) 

X; : Coefficients of the crisp cost function, i = 1, 2, 3; j=1,..., N 

Y; : Coefficients of the manpower requirement function, i = 1, 2, 3; j = 
1...,N 

Zj : Coefficients of the breakdown function, i = 1, 2,3;j7=1,...,N 


Mh : Number of manhours available for maintenance per year 
B : Number of breakdowns per year 
Bmax : Maximum of acceptable breakdowns per year 


Crisp Mathematical Model. For N = 2 and C = total cost, the following crisp 
model was the point of departure: 


n x . 
minimize C= Oh...) = È| xT, ++] 
j=l J 
N V3; 
such that $ [> jT; + y2; + 2) < Mh 
j=l T, 
N Z3, 
la + 22; +2) < Brax 
j=l T, 


T; 20 
The requirements were as follows: 


1. Cost should not exceed 500 considerably—and in no case should exceed an 
upper bound that could be varied. 

2. Manpower Mh should generally not exceed 1,100, and by no means 1,200. 

3. The number of breakdowns can exceed 50 but never 300 (Bmax). 
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Fuzzy Mathematical Model. The symmetrical concept of a decision (defini- 
tion 14-1) was used, and the optimal decision was defined to be 


Tj) =T, ouz, = max minp,(7;) 
J l 


Two types of membership functions were investigated: a linear membership func- 
tion and a nonlinear two-parameter membership function. 


Type 1 Membership Functions 


1 Cy -C 
ue) == iu + sen(C; — C)]+[1+sgn(C — C,)]- (=) 
2 C; — Cu 
where C; and Cy represent the lower and upper bounds for total cost. 


1 
in (T))= 3 {+ sen(Mh, — Mh] + [1+ sent Mh — Mh) (a aan 


with Mh, and Mhy the lower and upper bounds. 





1 Bu -B 
us(T,)= Zl +sgn(B, — B)]+[1 +sgn(Bv — B: )]- (= B, } 


with B, and By the lower and upper bounds. 


Type 2 Membership Function. We shall only show the membership function 
for the objective function. The others are defined accordingly: 


1+ sgn(C -— C,) 
(1 Ee 
b 


1 Cı 


ue(T;)= 3 [1+ sgn(C, — C)]+ 





b, and c; serve as means of better fitting the membership function to the real sit- 
uation. On the other hand, they obviously increase the computational effort. 

Detailed numerical results, as well as a comparison of the performance of the 
min-operator versus the product operator as a model for the intersection, can be 
found in Holtz [1981]. 


15.3.3.5 Scheduling Courses, Instructors, and Classrooms. It is well 
known that the determination of time schedules in which several resources have 
to be combined belongs to the most difficult combinatorial problems in opera- 
tions research. Rarely does one ever try to determine optimal schedules. The 
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determination of feasible schedules is very often the best one can hope for. The 
difficulty of obtaining such schedules by formal algorithms might partly be due 
to the fact that constraints are treated as crisp requirements even though in reality 
they often are flexible. The following case indicates how a combination of fuzzy 
set theory and heuristics can lead to quite acceptable results. 


Model 15-7 [Prade 1979] 


Problem Description. A quarterly schedule in a French university is to be 
determined. There are N (here N = 4) instruction programs; each lasts one year, 
and a student can only attend one of them. Each instruction program / consists 
of M(I) courses (here, 10 < M(/) < 14). Each course contains lectures, lab work, 
and a final examination. 

A course is taught by one instructor, supported by several teaching assistants. 
An instructor may teach several courses in one or several instruction programs. 
The availability of an instructor differs from person to person. An instructor may 
be present for only some predetermined days of a week; another may be avail- 
able for only some weeks during the quarter. Information about the availability 
of instructors is only known approximately beforehand. 

A schedule has to satisfy seven “global” constraints: 


1. Each instruction program must be completely planned for the entire school 
year. 

2. There are precedence constraints between courses (or sometimes parts of 
courses) that are elements of the same instruction program. 

3. Itis not desirable that more than four weeks elapse between the first lecture 
of a course and its final examination. 

4. It is not desirable that any course that has already begun is interrupted for 

more than a week. 

Some courses can be in common in several instruction programs. 

An instructor is not always available. 

7. Itis very desirable that several courses (three or four) are planned during the 
course of the same week. 


nM 


Constraints 1, 2, 5, and 6 are considered as “hard,” 3, 4, and 7 as “soft” con- 
straints. More local constraints will be considered later. 


Solution. The flow time of a course is considered as a fuzzy number with a 
membership function similar to that shown in figure 15—24. These fuzzy numbers 
in L-R representation (see definition 5—6) are used to compute via fuzzy PERT a 
fuzzy early starting date 7; and a late ending date d.. If x denotes time, then the 
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0.5 


0 
0 1 2 3 4 S Weeks 


Figure 15-24. Flowtime of a course. 


interval J; in which the course i will be taught is a fuzzily bounded interval (see 
figure 7-5), bounded by 7; and q, respectively. — 
The membership function of these intervals J; is 


us, (x) for x<7 
bi, (x) =51 for xeln,d;] 
uz(x) for x2d; 


where r;, d; are the mean values of r; and d, respectively. 

The “global” constraints are taken into consideration successively: Constraints 
l and 2 are used as a basis for PERT; constraint 4 is used to compute whole pro- 
grams from single courses. And if constraint 5 is relevant, the intersection of the 
different possibility intervals for all relevant courses in all effected instruction 
programs is computed. Constraint 6 is taken care of similarly. 

So far, the slack time for each course, the work load of each instructor, and 
the number of courses per week for each instruction program have been deter- 
mined. Modifications of this schedule due to the availability of the instructors 
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Data 











Fuzzy Subsets of 
Weeks When a Course 
Can Be Planned 






Fuzzy P.E.R.T. 
Instructors’ Availability 





Computation of Slack Time 
instructors’ Work Load and 
Busy Level of the Weeks 






Analysis of the 
Reelisability 





Modification of the Data 


Actualization During the Quarter 


Heuristics Priority Evaluation 





Heuristics Week Schedule aa o 


Figure 15-25. The scheduling process. 


can now be made, and the following “local” constraints are considered by inter- 
actively changing schedules that have been generated automatically via heuristic 
priority assignment. Figure 15-25 summarizes the entire process. 

“Local” constraints are as follows: 


1. There exist precedence constraints between lectures and lab work inside a 
course (the graph of these constraints is not the same for all the courses). 
2. An instructor can teach only one lecture at a given moment. 
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3. It is generally desirable to plan two lectures of the same course in succes- 
sion, but not three. 

4. Itis not desirable that an instructor teach more than two lectures of different 
courses in the same morning. 

5. Itis desirable to give priority to lectures in the morning and lab work in the 
afternoon. 


Example 15-4 


The following tables and figures can only serve to visualize the process. Details 
can be found in Prade [1977]. Figure 15-26 presents the data of one of the four 
instruction programs that were considered. All courses had to be scheduled within 
one quarter of 11 weeks. Table 15-17 gives the node number, name of courses, 
instructor number, and category (1 to 4 indicate different availabilities of the 
instructor). p, œ, and B are the mean values of the left and right spreads of the 
processing time for each course. The availability for each instructor is given in 
table 15-18. Table 15-19 gives course numbers, initialized by the name of the 
instructor and early start and late finish times. 


Table 15-17. Structure of instruction program. 


Instructor Processing time 
category 
N Name number a p B 
1 A231 (L1 to 6) 9 3 0.5 2 1 
2 A231 (L7 to 10) 9 3 0.5 1 1 
3 A141 (LI to 6) 12 3 0.5 2 1 
4 A141 (L7 to 10) 12 3 0.5 1 1 
5 A121 12 3 1 3 1.5 
6 A241 12 3 1 3 1.5 
7 A510 9 3 1 3 1.5 
8 M317 (L1 to 8) 17 4 1 3 1.5 
9 M317 (L9 to 10) 17 4 1 3 1.5 
10 PS16 8 1 0 4 0 
11 V231 21 2 0 1 0 
12 V211 1 4 1 3 1.5 
13 E541 23 4 1 3 1.5 
14 E551 23 4 1 3 1.5 
15 M361 13 3 1 3 1.5 
16 E531 11 1 0 4 0 
17 E532 11 1 0 4 0 
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Figure 15-26. Courses of one instruction program. 
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Table 15-18. Availability of instructors. 


Weeks 
Instructor 
number l 2 3 4 5 6 7 & 9 10 Il 
1 0 0 0 0 0.5 1 1 1 1 1 0.5 
8 0 0 0 0 0.5 1 1 1 1 0.5 0.5 
9 1 1 1 1 1 1 1 1 1 0 1 
11 0 0 0 0.5 1 1 1 1 1 1 1 
12 1 1 0.5 0 1 1 1 1 1 1 1 
13 1 1 1 0.5 0 1 1 1 1 0.5 0.5 
17 1 1 1 1 1 1 0.5 0 0.5 1 1 
21 1 1 0.5 0 0 0 0 0 0 0 0 
23 1 1 1 1 1 1 0.5 0.5 0.5 0 0 
Table 15-19. PERT output. 
Name a p B (04 d B 
A231 0 1 0 0 11 0 
A141 0 1 0 2 6 2 
A121 0 1 0 1.5 4 1 
A241 1 4 1.5 0 11 0 
A510 1 4 1.5 0 7 0 
M317 0 1 0 0 11 0 
PS16 2 7 3 0 11 0 
V231 0 1 0 0 3 0 
V211 0 5 0 0 11 0 
E541 0 1 0 1.5 6 1 
E551 1 4 1.5 0 9 0 
M361 0 1 0 0 11 0 
E531 0 4 0 0 7 0 
E532 0 8 0 0 11 0 


As reference (membership) functions for the fuzzy numbers in L-R- 
representation representing the “flowlines” of the course, Prade used L(x) = exp 
[—x*] and R(x) = max [0, 1 — x’]. 

The intersection of the availability schedule (table 15-18) and the PERT 
schedule yields the possibility schedule of weeks in which courses can be sched- 
uled (table 15-20). Table 15-21 shows an example of the final schedule for the 
first week. 
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Table 15-20. Availability of weeks for courses. 


Weeks 
Name I 2 3 4 5 6 7 8 9 10 Il 
A231 1 1 1 1 1 1 1 1 0 1 1 
A141 1 1 0.5 0 1 1 0.8 0.4 0 0 0 
A121 1 1 0.5 0 0.4 0 0 0 0 0 0 
A241 0 0 0.4 0 1 1 1 1 1 1 1 
M510 0 0 0.4 1 1 1 1 0 0 0 0 
M317 1 1 1 1 1 1 0.5 0 0.5 1 1 
PS16 0 0 0 0 0.4 0.8 1 1 1 0.5 0.5 
V231 1 1 0.5 0 0 0 0 0 0 0 0 
V211 0 0 0 0 0.5 1 1 1 l 1 0.5 
E541 1 1 1 1 1 1 0.4 0 0 0 0 
E551 0 0 0.4 1 1 1 0.5 0.5 0.5 0 0 
M361 1 1 1 0.5 0 1 1 1 l 0.5 0.5 
E531 0 0 0 0.5 1 1 1 0 0 0 0 
E532 0 0 0 0 0 0 0 1 | 1 1 


Table 15-21. First week’s final schedule. 


Morning Afternoon 
Monday A141 A141 A121 — 
L.1 L.2 L.1 
Tuesday A231 A231 M317 — 
L.1 L.2 L.1 
Wednesday A231 A231 M317 A231 
L.3 L.4 L.2 L.W.1 
Thursday A141 A141 A121 Sports 
Friday A121 A121 M317 A141 


15.3.4 Fuzzy Models in Inventory Control 


There exist a large number of inventory models in operations research using a 
great variety of methods for their solution. For inventory models using linear or 
integer linear models, the approach of section 14.2 or an algorithm described 
in Zimmermann and Pollatschek [1984] may be used. For solutions basing on 
differential calculus, the models in chapter 7 might be useful. Kacprzyk and 
Staniewksi [1982] present a very interesting approach for aggregate inventory 
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planning, using primarily the concept presented in chapters 3 and 5 of this book. 
We shall present a model that uses Bellman and Zadeh’s approach to fuzzy 
dynamic programming discussed in section 4.3. 


Model 15-8 [Sommer 1981] 


The management of a company wants to close down a certain plant within a 
definite time interval. Therefore production levels should decrease to zero as 
steadily as possible and the stock level at the end of the planning horizon should 
be as low as possible. The demand is assumed to be deterministic. 


Mathematical model. Let 


d;e D,i=1,...,N be the decision variable representing the production 
level in period i, 


where 
D={qQ,...,@,} is the set of values permitted for the decisions. 
x% E X,i=1,...,N+1_ be the state variable representing the inventory level 
at the beginning of period i, 
and 
X = {T,..., Tm} is the set of possible state values, 
a;;=1,..., Nis the deterministic demand in period i, 


Xa = x; + d; — a; is the crisp transformation function, 

Cd) = {(d;, U &(d;)} are fuzzy constraints on the decision variables 
representing the goal “production should decrease as steadily as 
possible,” 

i=1,...,N, and 
G xn) = {Xna UG@nu1)} is the fuzzy goal, representing the decision to have 
as low a stock level as possible at the end of the planning horizon. 


Then, using equation (14.20), the membership function of the decision on stage 
i iS 


Hold) = minig (d,), We nai} 
and the membership function of the maximizing decision on stage i is 


bpd) = max{min {We (d;, We Owes) 


which can be determined recursively using equation (14.23), 
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As will be shown in the following numerical example, the state spaces can 


sometimes be reduced even further by introducing a bound on the basis of 
heuristic considerations. 


Example 15-5 


Let 
0 if 0 <d; < 60-10: 
(d)= -3+.5i+d,/20 if 60-101 <d,<80-10i 
Bale 5-.5i-d,/20 if 80-10i <d,<100-10i 
0 if 100-101 <d, 
and 


l—xys/20 if O< xy < 20 
Wena (Xv+i) = 
0 else 
d; = 45, an = 50, d3 = 45, d4 = 60, and N=4 
x, the stock level at the beginning, is supposed to be 0. 
t; ={0,5,10,...} 
ao, ={0,5,10,...} 


Only {d; | U~(d;) > 0} are of interest. Hence we can put a bound on the decision 
variables as follows: 


i d; di 
1 55 85 
2 45 75 
3 35 65 
4 25 55 


Also, 0 < x5 < 20. 

Using the transformation function, we can also find upper and lower bounds 
for the state variables on the different intermediate stages. We proceed in three 
steps: First we determine upper bounds x’ and lower bounds x; from the forward 
calculation. The according bounds x? and x/ from backward calculation are com- 
puted in the second step. Then we can obtain the final bounds by 
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x! = min{x;, xi } 
xi =max{x;, x; } 


The lower bound for the state variable x; can be calculated as 
x? =max{0,x/,+d!/,-a,,} i=2,...,4 
The appropriate upper bound is 
x =x“ +d -an i=2,...,4 


For the different stages we obtain, for x, = 0, 


ta 
X 


1 — — 
2 10 40 
3 65 
4 0 85 
5 — — 


Starting with x; and assuming x} = 0 and x = 20, we obtain recursively the fol- 
lowing upper and lower bounds: 


i x” xi 

1 — — 
2 0 65 
3 0 60 
4 5 50 
5 — — 


The final upper and lower bounds can be determined by 
x} =max{x;, x! } 


xf =min{x/', xi} 
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Hence 


i x! xë 
1 0 0 

2 10 40 
3 5 60 
4 5 50 
5 0 15 


Now we can determine the optimal d; and x; within the lower and upper bounds 
computed above: 


Stage 1: Using equation (14.23), we obtain 
We, (x4) = maximin[ue(d4), We (x4, dy) |} 


= max{min|}¢ (d4), Wg (x4 +d, — a4) I} 


d, 

X4 25 30 35 40 45 50 55 Uá (x4) 

5 1/4 1/4 
10 1/2 1/4 1/2 
15 3/4 1/2 1/4 3/4 
20 l 3/4 1/2 1/4 l 
25 3/4 3/4 1/2 1/4 3/4 
30 1/2 3/4 1/2 1/4 3/4 
35 1/4 1/2 1/2 1/4 1/2 
40 1/4 1/2 1/4 1/2 
45 1/4 1/4 1/4 


50 1/4 1/4 


Stage 2: Up(x3)= max{min[ue(d;), Ue (x3 +d; —a;)]} 
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d; 

X3 35 40 45 50 55 60 65 UG,(x3) 

5 1/4 1/2 3/4 1/2 1/4 3/4 
10 1/4 1/2 3/4 3/4 1/2 1/4 3/4 
15 1/4 1/2 3/4 1 3/4 1/2 1/4 1 
20 1/4 1/2 3/4 3/4 3/4 1/2 1/4 3/4 
25 1/4 1/2 3/4 3/4 1/2 1/2 1/4 3/4 
30 1/4 1/2 3/4 1/2 1/2 1/4 1/4 3/4 
35 1/4 1/2 1/2 1/2 1/4 1/4 1/2 
40 1/4 1/2 1/2 1/4 1/4 1/2 
45 1/4 1/2 1/4 1/4 1/2 
50 1/4 1/4 1/4 1/4 
55 1/4 1/4 1/4 


60 1/4 1/4 


Stage 3: Us(x2) = maximin[u e(d: ), Wa (x2 +d, —a>)|} 


d, 
X2 45 50 55 60 65 70 75 UG,(x2) 
10 1/4 1/2 3/4 3/4 3/4 1/2 1/4 3/4 
15 1/4 1/2 3/4 3/4 3/4 1/2 1/4 3/4 
20 1/4 1/2 3/4 3/4 1/2 1/2 1/4 3/4 
25 1/4 1/2 3/4 1/2 1/2 1/2 1/4 3/4 
30 1/4 1/2 1/2 1/2 1/2 1/4 1/4 1/2 
35 1/4 1/2 1/2 1/2 1/4 1/4 1/4 1/2 


40 1/4 1/2 1/2 1/4 1/4 1/4 1/2 
Stage 4:  uplx)= max{min[He(x1), uox +d, -a)l 
1 
dı 


0 1/4 1/2 3/4 3/4 1/2 1/2 1/4 
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15.3.5 Fuzzy Sets in Marketing 


Classical applications of fuzzy sets in marketing, such as media selection [Wiedey 
and Zimmermann 1978], are too similar to other applications of fuzzy linear pro- 
gramming to be discussed here again. We shall rather turn to recent and modern 
applications of fuzzy technology in marketing, which base primarily on data 
mining. The motivation and justification of these applications is, of course, the 
change of data availability that has already been mentioned before. 

The so-called database marketing became only feasible after enough data were 
available. Unluckily the data masses grew that fast, that they superceded quickly 
the human competence of percepting complex and little structured data pools. 

In the following we shall focus on the area of market segmentation and present 
first an easy to understand static problem and then a more complicated dynamic 
problem. 


15.3.5.1 Customer Segmentation in Banking and Finance. Banks and insur- 
ance companies have now-a-days masses of data about customers stored in their 
data banks and data warehouses. They have a number of products that they want to 
offer their customers, such as shares, bonds, derivates, etc. and they develop new 
products. On the other hand, each individual customer has certain wishes and 
needs, preferring one or the other of the products, i.e. they have certain product 
affinities. If a bank would offer all its products to all its customers (for instance, by 
a mailing) then it would be very costly, the relative effectiveness would be very 
low and the customers might even be frustrated by getting that much mail. 

In databank marketing the bank tries to subdivide its customers into segments 
which are as homogenous with respect to the needs of the customers in a segment. 
One can then offer special products only to segments which have a high demand 
for this product. This is called customer segmentation. 

Traditionally the features of these segments are defined as crisp intervals con- 
cerning, e.g. property, debt, income, balance of account, age, etc. and nominal 
features, such as sex, marital status, profession, etc. 

The main disadvantages of these feature definitions are, that there is no com- 
pensation between the features, that wrong classifications occurs and that dynamic 
changes of the customers cannot be accounted for. If fuzzy analysis, e.g. fuzzy 
clustering, is used, marginal customers are better classified, existing compensa- 
tions can be considered and dynamic changes may be recognized via changes of 
degrees of membership of customers to clustery. 


Example 15-6 


In this experimental study of 300 customers the following features were used: 
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Income 

Credit 

Age 

Property 

Profit (of the bank). 


Traditionally the bank distinguished three classes: 
Class 1: Annual income less than DM 30,000 


Property less than DM 40,000 
Class 2: Income between DM 30,000 and 80,000 
Property between DM 40,000 and 200,000 
Class 3: Income more than DM 80,000 
Property more than DM 200,000 


The marketing effect turned out to be very unsatisfactory. 

Subsequently fuzzy clustering was used. The fuzzy c-means algorithm, de- 
scribed in chapter 13 was used. It turned out, that nine classes represented the opti- 
mal number of classes and shown in table 15—22 class centers were determined: 

These classes would certainly not have been found by a traditional classifica- 
tion. At the first glance those classes do not really make much sense. When shown 
to marketing experts, however, they found the following very plausible descrip- 
tion of the classes: 


Class: Content: 


more than 60-year-old persons with low income and a certain property 
in training 

3" stage of life high income and property 

3" stage of life low income and property 

career persons 

high senior citizens’ segment 

junior segment 

persons with high credit capacity 

social weak segment 


OAonrnNANNM BB WN = 


15.3.5.2 Bank Customer Segmentation based on Customer Behavior [Ang- 
stenberger 2001]. By contrast to last section this time not the present status of 
the customers (snap shot like) but the dynamic behavior of the customers will be 
used for segmentation. The study, which is described here, is much larger than 
that in last section. It concerns 24,267 customers of a commercial bank and it is 
described in detail in [Angstenberger 2001]. Here we summarize the problem, 
the process and the results. 
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Table 15-22. Cluster centers of nine optimal classes. 
Age Income Property Credit Profit 
class 1 63 2,585 10,485 2,965 45 


class 2 28 2,300 8,020 3,200 24 
class 3 52 5,260 50,920 7,830 256 
class 4 53 3,200 20,785 6,040 165 
class 5 43 6,240 22,680 8,925 117 
class 6 78 2,190 34,280 1,185 316 
class 7 9 235 3,285 230 5 
class 8 42 1,120 15,705 150,060 930 


class 9 37 955 13,405 7,302 0 


Table 15-23. Dynamic features describing bank 
customers. 


Feature Description 


Overdraw limit on account 

Current end-of-month balance 

Maximum balance this month 

Minimum balance this month 

Average credit utilization this month 

Credit turnover this month 

Number of bank-initiated payment reversals,i.e. 
returned cheques/cancelled direct debits in 
the current month 


TILA nA HB WN — 


The bank customers are described by two static features, seven dynamic fea- 
tures and one categorical feature. The first two features are used as unique 
identification numbers of customers, such as the customer number and account 
number. The seven dynamic features characterize the state of an account each 
month and are represented by sequences of 24 measurements. They are summa- 
rized in table 15-23: 

One of the categorical features provided for bank customers determines special 
account properties which can be savings / time deposits / depots and can take two 
values, such as “yes” or “no”. According to bank experts, customers with or 
without these account properties must be treated separately since they may exhibit 
different payment behavior. Therefore, the data set will be separated into two 
subsets according to this categorical feature. The first set of customers charac- 
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terized by feature valus “yes” and possessing a savings account or depots con- 
sists of 4,688 customers, while the other set includes 19,579 customers without 
the said properties of their accounts. These two sets of customers will be denoted 
hereinafter as groups “Y” and “N”, respectively, and the analysis of the customer 
structure will be performed separately for each group. 

After a preliminary analysis of data sets including the calculation of the mean, 
minimum and maximum values of trajectories and their variances, it can be seen 
that the value ranges of the seven features are very large and different. The main 
statistical characteristics of the data group “Y” are summarized in table 15-24. 

Those of customers “N” are shown in table 15-25. 

The goals of the dynamic analysis of bank customers can be formulated as 
follows: 


1. to find segments of customers with similar payment behavior based on the 
whole temporal history covering two years; 

2. to find segments of customers with similar payment behavior based on the 
temporal history of half a year, and to follow changes in the cluster structure 
and in the assignment of customers to the clusters over time. 


Table 15-24. Main statistics of each feature of the data group “Y”. 


Features Mean value Standard deviation 0 u- 30 U +30 

1 14,510.06 48,097.47 —129,782.34 158,802.46 
2 16,347.19 134,581.82 —387,398.26 420,092.63 
3 7.77 100,780.11 —302,332.56 302,348.10 
4 37,655.46 273,818.07 —783,798.76 859,109.67 
5 7,946.71 36,071.66 —100,268.26 116,161.68 
6 81,002.74 636,081.44 —1,827,241.60 1,989,247.10 
7 0.01 0.00 —0.11 0.12 
Table 15-25. Main statistics of each feature of data group “N”. 

Features Mean value u Standard deviation 0 U-30 U+30 

1 20,811.16 63,966.97 —171,089.75 212,712.06 
2 —4,278.04 145,423.24 —440,547.76 431,991.68 
3 —13,393.86 153,139.98 —472,813.81 446,026.08 
4 7,360.17 157,022.60 —463,707.63 478,427.98 
5 17,426.61 123,226.32 —352,252.36 387,105.59 
6 35,406.50 318,825.34 —92 1,069.54 991,882.53 
7 0.01 0.00 —0.10 0.11 
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The first goal can be achieved by clustering customers represented by trajecto- 
ries of their features on the time interval of two years. The clustering results 
provide information about the structure within the customer portfolio appearing 
during this time interval until the current moment. These results are suitable for 
distinguishing between “good” and “bad” customers according to their long-term 
payment behavior. The analysis of a long history is often carried out by banks to 
achieve reliable results, particularly for recognizing “bad” credit customers. The 
drawback of this analysis is, however, that the classifier cannot be used to clas- 
sify new observations of existing customers or observations of new customers 
for the next two years, since the cluster prototypes are described by trajectories 
with a length of 24 months and thus cannot be compared with shorter sequences 
of observations. Thus, the classification of new observations and updating the 
classifier (if necessary) can be repeated every two years. In this case the design 
of the classifier is static, but the classifier is dynamic in nature since it is applied 
to dynamic objects. 

A more applicable classifier can be designed by clustering sequences of obser- 
vations over half a year, which is the second goal of the analysis conducted. This 
analysis allows one to recognize customer segments based on the short-term 
payment behavior of customers and to detect temporal changes in the customer 
behavior. The classification of new observations of existing or new customers can 
be repeated every six months providing up-to-date information about the cus- 
tomers’ states and their development. If changes in the customer structure are 
detected, the classifier is adapted according to the detected changes, which cor- 
responds to an update of the customer segments and their descriptions. There- 
fore, this type of analysis is based on dynamic classifier design and classification 
applied to dynamic objects. 

The following tasks will be performed, using pointwise similarity as defined 
in section 13.3.2. 

It would exceed the scope of this book to explain in detail the algorithmic 
steps performed in this study. It should be mentioned, however, that pointwise 
similarity is defined by 


Table 15-26. Scope of the analysis of bank customers. 


III. Type of similarity 


I. Length of temporal history II. Type of customers measure for trajectories 
The whole temporal history Customers of group ‘Y’ Pointwise similarity 
t = [1, 24] 


Time windows equal to half a year Customers of group ‘N’ 
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» a) = 
u(y, a) a 
which difines the membership function of the fuzzy set “approximately zero”, 
which was described in figure 13-22. 

A is defined as a function of œ of the o-cut chosen of this fuzzy set and B is 
a parameter that determines the shape of the membership function 


1-a 
op? 


The arithmetic mean is used for aggregation. For other parameter settings see 
[Angstenberger 2001]. When segmenting (clustering) customers, it is important 
that as many as possible of the customers are absorbed (belong to) a cluster and 
that the customers outside of clusters (here called “stray-customers’’) are not too 
numerous. This depends, amongst other parameters, on the a-cut and the number 
of clusters used. For the customers group “Y” table 15-27 shows the number of 
stray-customers for different &-cuts (here called u°) and for two clusters. For 
u° = 0.5 the numbers for 3 and 4 cluster are 692 and 560 respectively. 

Table 15-28 shows the respective results for N-customers. 

For this group the numbers of stray-customers are 4,563 for c = 3 and 14,270 
for c = 4. 

Hence, for both groups the number of clusters c = 2 is optimal. 

The features used for clustering are now trajectories and not point. The fol- 
lowing two figures show this, exemplarily, for the cluster centers of feature 1 for 
the customer groups “Y” and “N”. 

So far the segmentation was performed on the basis of the whole 24 month 
history. A similar analysis was done on the basis of a moving time window of six 
months with similar results. It showed quite well how customers moved from one 
cluster to another over time. 


Table 15-27. Absorbed and stray customers for “Y”- 


group. 
Absorbed 
C; C2 Stray 
u° = 0.3 3,114 1,419 155 
u° = 0.4 3,099 1,396 193 
u° = 0.5 3,081 1,367 240 


u° = 0.6 3,065 1,308 315 
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Table 15-28. Absorbed and stray-customers for “N”- 


group. 
Absorbed 
C; C: Stray 
u? = 0.3 12,227 5,975 1,377 
u? = 0.4 11,073 5,964 2,542 
u° = 0.5 9,494 5,942 4,143 


u°? = 0.6 6,479 5,801 7,299 


Feature 1: Current end-of-month balance 
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Figure 15-27. Feature 1: current end-of-month balance for “Y”. 


Tables 15~29 and 15-30 indicate this movement of customers. 

After conducting four types of analysis for different customer groups and for 
different lengths of the temporal history it is necessary to compare the results 
obtained. It has already been stated that the customer segments recognized based 
on the whole temporal history and in the first time window are very similar, 
however the feature values characterizing cluster centers in the first case are 
somewhat larger in the absolute values compared to those in the second case. 

Comparing the results for customers in group “Y” and “N,” it can be seen that 
the values of the end-of-month balance of customers in group “Y” exceed the 
corresponding values of customers in group “N”, and vary in the larger value 
range. The credit turnover (not shown here) of the first customer group is approx- 
imately 10,000—20,000 DM larger than the values of the other customer group, 
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Figure 15-28. Feature 1: current end-of-month balance for “N”. 


Table 15-29. Temporal change of assignment of customers in group “Y” to 
clusters. 


From tw, to tw> From tw, to tw; From tw; to tw, 
Number of customers C; C, C; C; C; C: 
Remained in C; 1,118 2,947 1,046 2,921 1,120 2,997 
Moved from C, into C, 64 91 45 
Moved from C; into C; 31 102 40 
Dropped out of C; 55 74 77 82 75 106 


Appeared in C; 94 92 131 65 50 65 


Table 15-30. Temporal change of assignment of customers in group “N” to 
clusters. 


From tw, to tw, From tw; to tw; From tw; to tw, 
Number of customers C; C; C; C: C; C: 
Remained in C; 10,466 4,323 9,951 4,045 10,250 4,321 
Moved from G; into C, 108 336 104 
Moved from C, into C; 163 235 123 


Dropped out of C; 653 501 911 795 612 571 
Appeared in C; 569 644 780 634 672 633 
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whereas the credit utilization (not shown here) of the active users is 
20,000—30,000 DM lower. Therefore, customers in group “Y” belonging to the 
segment of “active users” have more entries in their accounts, higher monthly 
account statements and use bank credit less actively than customers in group “N”. 
Customers in the second segment, “non-users”, are similar in their behavior for 
both groups of customers. 

The results of analysis conducted in this section can help a bank to better 
understand the customer portfolio, to distinguish between different groups of 
active users and non-users in order to be able to develop particular marketing 
strategy which may be, for instance, offering special favorable services to a group 
of the most active users. 

Bank customer segmentation was carried out based on the dynamic data rep- 
resenting customers’ temporal behavior and by applying the dynamic fuzzy clus- 
tering algorithm. The dynamic analysis allows to take into consideration the 
payment behavior of customers over a period of time which characterizes cus- 
tomers much better than a single observation. Until now in most applications 
related to customer segmentation and described in the literature the static analy- 
sis of customers was performed based on measurements at a certain moment of 
time. These analysis results are obviously not very reliable, since clusters, or cus- 
tomer segments, obtained from such analysis can often change due to significant 
fluctuations of account feature values that requires periodic reclustering. By con- 
trast the dynamic fuzzy clustering helps to save time and can provide more reli- 
able complete results. 


Exercises 


1. In what ways and for what purposes can fuzzy sets be used in operations 
research? 

2. Explain why in model 14-3 every nondecreasing operator can be used to 
combine the goal with all of the constraints. 

3. Could approaches (13.9) or (13.18) have been used in model 14-2? If so, 
what would have been the consequences? 

4. In section 14.3.1, a fuzzy decision model has been employed as an opti- 
mization criterion. Can this approach be used for both precise and heuris- 
tic algorithms? 

5. Inthe system presented in section 14.3.2, the decision support module picks 
one precedence constraint out of the subset C of the set of all unordered 
pairs of operations. Consider multiple criteria for the selection of the subset 
C. Discuss possible fuzzy aggregation models for the derivation of the 
subset C. 
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6. Assume in model 14-7 that the instructors’ availability is given by the fol- 


10. 


lowing table: 


Weeks 

Instructor ———~~ e 
number 1 2 3 4 5 6 7 8 9 10 11 

1 0 0 0 5 05 1 1 1 5 0 0 

8 0 0 5 1 1 1 1 5 5 0 0 

9 5 1 1 1 1 1 1 5 5 0 0 
11 0 0 0 5 5 1 1 1 5 5 0 
12 1 1 5 5 1 1 1 1 1 1 1 
13 0 1 1 5 5 1 1 1 5 0 0 
17 1 1 1 1 1 1 1 1 1 1 1 
21 5 5 1 1 5 0 0 0 0 0 0 
23 1 1 1 1 5 5 5 0 0 0 0 


Determine a new table 14-15 of available weeks for courses, and try to 
determine heuristically a first week’s final schedule. 

Discuss approaches, and their advantages and disadvantages, for PERT net- 
works in which activity times are fuzzy and stochastically uncertain. 
Determine the critical path for the network shown in table 14-14 by sub- 
stituting for the addition of activity time in the normal critical path method 
the extended addition (section 5.3.1). 

Determine an optimal policy for model 14-8 modified as follows: the 
demands are a; = {40, 40, 45, 50}, and the fuzzy set goal is characterized 
by the membership function 





XN+ . 
1— f O< xy, <10 
Wena -| 10 ! XN+ 

0 


else 


In the example shown in table 14-18, the membership degrees of the dis- 
trictings were evaluated subjectively by the decision maker. Consider fuzzy 
accessibility measures for the “nearness” or “accessibility” of a service 
point to every other point in a district for a location problem greater than 
that shown in figure 14—15. Develop a model in which these accessibility 
measures are aggregated to a fuzzy measure for the “acceptability” of every 
district and are further aggregated to a fuzzy measure for the membership 
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degree of every districting to the fuzzy set of “best districtings.” Discuss 
the sensitivity of such an approach to the choice of the intersection 
operator. 

Discuss the possible use of expert systems and FLC model in operations 
research. Do those approaches satisfy sound OR principles? 


T 6 EMPIRICAL RESEARCH 
IN FUZZY SET THEORY 


16.1 Formal Theories vs. Factual Theories vs. 
Decision Technologies 


The terms model, theory, and law have been used with a variety of meanings, for 
a number of purposes, and in many different areas of our lives. It is therefore 
necessary to define more accurately what we mean by models, theories, and laws 
in order to describe their interrelationships and to indicate their use before we 
can specify the requirements they have to satisfy and the purposes for which they 
can be used. To facilitate our task, we shall distinguish between definitions given 
and used in the scientific area and definitions and interpretations as they can be 
found in more application oriented areas, which we will call “technologies” in 
contrast to “scientific disciplines.” By technologies we mean areas such as oper- 
ations research, decision analysis, and information processing, even though these 
areas call themselves sometimes theories (i.e., decision theory) and sometimes 
science (i.e., computer science, management science, etc.). This statement is by 
no means a value judgment; we only want to indicate that the main goals of these 
areas are different. While the main purpose of a scientific discipline is to gener- 
ate knowledge and to come closer to truth without making any value judgments, 
technologies normally try to generate tools for solving problems better, very often 
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by either accepting or being based on given value schemes. Let us first turn to 
the area of scientific inquiry and consider the following quotation concerning the 
definition of the term model: “A possible realization in which all valid sentences 
of a theory T are satisfied is called a model of T.” 

Harré [1967, p. 86] states, “A model, a, of a thing, A, is in one of many pos- 
sible ways a replica or an analogue of A.” And a few years later, “In certain formal 
sciences such as logic and mathematics a model for or of a theory is a set of sen- 
tences, which can be matched with the sentences in which the theory is expressed, 
according to some matching rule. ... The other meaning of ‘model’ is that of 
some real or imagined thing or process, which behave similarly to some other 
thing or process, or in some other way than in its behavior is similar to it” [Harré, 
1972, p. 173]. He sees two major purposes of models in science: (1) logical: to 
enable certain inferences to be made that would not otherwise be possible; and 
(2) epistemological: to express and enable us to extend our knowledge of the 
world. Models, according to Harré, are used either as a heuristic to simplify a 
phenomenon or to make it more readily manageable and explanatory where a 
model is a model of the real causal mechanism. 

Leo Apostel [1961, p. 4] provides us with a very good example for various 
definitions of models as tuples of a number of components in the following def- 
inition: “Let then R (S, P, M, T) indicate the main variables of the modelling rela- 
tionship. The subject S takes, in view of the purpose P, the entity M as a model 
of the prototype T.” For the four components of the definition, he gives a number 
of examples that are quite informative concerning the use of models in science 
and that can be summarized as follows: 

Subjects (S) and purposes (P): 


1. For a certain domain of facts, let no theory be known. If we replace our study 
of this domain by the study of another set of facts for which a theory is well 
known and that has certain important characteristics in common with the field 
under investigation, then we use a model to develop our knowledge from a 
zero (or near zero) starting point. 

2. For a domain D of facts, we do have a full-fledged theory, but one too diffi- 
cult mathematically to yield solutions, given our present techniques. We then 
interpret the fundamental notions of the theory in a model, in such a way that 
simplifying assumptions can express this assignment. 

3. If two theories are without contact with each other, we can try to use the one 
as model for the other or introduce a common model interpreting both and 
thus relating both languages to each other. 

4. Ifa theory is well confirmed but incomplete, we can assign a model in the 
hope of achieving completeness through the study of this model. 

5. Conversely, if new information is obtained about a domain, to assure our- 
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selves that the new and more general theory still concerns our earlier domain, 
we construct the earlier domain as a model of the later theory and show that 
all models of this theory are related to the initial domain, constructed as 
model, in a specific way. 

6. Even if we have a theory about a set of facts, this does not mean that we 
have explained those facts. Models can yield such explanations. 

7. Leta theory be needed about an object that is too big or too small, too far 
away, or too dangerous to be observed or experimented upon. Systems are 
then constructed that can be used as practical models, experiments that can 
be taken as sufficiently representative of the first system to yield the desired 
information. 

Often we need to have a theory present to our mind as a whole for prac- 
tical or theoretical purposes. A model realizes this globalization through 
either visualization or realization of a closed formal structure. 


Thus, models can be used for theory formation, simplification, reduction, 
extension, adequateness, explanation, concretization, globalization, action, or 
experimentation. 

Entity (M) and model type (T): 

M and T are both images or both perceptions or both drawings or both for- 
malisms (calculi) or both languages or both physical systems. M can also be a 
calculus and T a theory or language, or vice versa. 


Apostel believes that all models that can be constructed by varying the contents of the 
four components form a systematic whole: Models are used for system restructuration 
because of their relations with the system (partial discrepancy); because of their rela- 
tionship among each other (partial inconsistency at least multiplicity); because of their 
relationship with themselves (locally inconsistent or locally vague). 


By now two things should have become obvious: 


1. There is a very large variety of types of models, which can be classified 
according to a number of criteria. For our deliberation, one classification 
seems to be particularly important: The interpretation of a model as a “formal 
model’ and the interpretation as a “factual, descriptive model.” This corre- 
sponds to Rudolph Carnap’s distinction between a logical and a descriptive 
interpretation of a calculus [Carnap 1946]. For him, a logically true interpre- 
tation of a model exists if, whenever a sentence is true, the second is equally 
true and if a whenever a sentence is refutable in the calculus, it is also false in 
the model. An interpretation is factual interpretation if it is not a logical inter- 
pretation, which means that whether a model is true or false does not depend 
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only on its logical consistency but also on the (empirical) relationship of the 
sentences (axioms of the model) to the properties of the factual system of 
which the model is supposed to be an image. The second interpretation of a 
model is the one that is quite common in the empirical sciences and it is the 
one we will primarily be referring to in the following. 

2. There is certainly a relationship between a model and a theory. This rela- 
tionship, however, is seen differently by different scientists and by different 
scientific disciplines. We will now try to specify this relationship because 
theories, to our mind, are the focal point of all scientific activities. 


For Harré [1972, p. 174] “A theory is often nothing but the description and 
exploitation of some model,” or “Development of a theory on the other involves 
the superimposing of one model on another” [1967, p. 99]. 

White [1975] eventually simply points out that 


There is a need to logically separate a model and a theory and that they play support- 
ing roles in decision analysis, viz., some theory is needed so that aspects of models can 
be tested and that some model is needed so that the affects of some changes can be 
examined. In particular validation of a model needs a theory. 


Thus, there seems to be a very intimate relationship between a model and a 
theory in scientific inquiry. Both, probably to varying degrees, are based on 
hypotheses, and these hypotheses can either be formal axioms or scientific laws. 
These scientific laws seem to us to fundamentally distinguish models and theo- 
ries in scientific disciplines from the type of models (sometimes also called 
theories) in the more applied areas: “An experimental law, unlike a theoretical 
statement invariably possesses a determinate empirical content which in princi- 
pal can always be controlled by observational evidence obtained by those proce- 
dures” [Nagel 1969, p. 83]. 

These laws as scientific laws assert invariance with respect to time and space. 
The tests to which such hypotheses have to be put before they can claim to be a 
law depend on the philosophical direction of the scientist. Karl Popper, as prob- 
ably the most prominent representative of “critical rationalism,” believes that 
laws are only testable by the fact of their falsifiability. Popper holds further that 
a hypothesis is “corroborated” (rather than confirmed) to the degree of severity 
of such tests. Such a corroborated hypothesis may be said to have stood up to the 
test thus far without being eliminated. But the test does not confirm its truth. A 
good hypothesis in science, therefore, is one that lends itself to the severest test, 
that is, one that generates the widest range of falsifiable consequences [Popper 
1959]. 
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16.1.1 Models in Operations Research and Management Science 


The area of operations research will be considered as an example of a more 
application-oriented discipline, which is here called “technology,” in which mod- 
eling plays a predominant role. Even though one might dispute whether opera- 
tions research is a science or a technology, this discussion will follow Symonds, 
who, as the President of the Institute of Management Science, stated, “Operations 
Research is the development of general scientific knowledge” [Symonds 1965, 
p. 385]. 

What, now, is a model in operations research? Most authors using the term 
model take it for granted that the reader knows what a model is and what it means. 
Arrow, for instance, uses the term model as a specific part of a theory when he 
says, “Thus the model of rational choice as built up from pairwise comparisons 
does not seem to suit well the case of rational behaviour in the described game 
situation” [Arrow 1951]. He presumably refers to the model of rational choice, 
because the theory he has in mind does not give a very adequate description of 
the phenomena with which it is concerned, but only provides a highly simplified 
schema. In the social and behavioral sciences as well as in the technologies, it is 
very common that a certain theory is stated in rather broad and general terms 
while models, which are sometimes required to perform experiments in order to 
test the theory, have to be more specific than the theories themselves. “In the lan- 
guage of logicians it would be more appropriate to say that rather than con- 
structing a model they are interested in constructing a quantitative theory to match 
the intuitive ideas of the original theory” [Suppes 1961]. Rivett, in his book Prin- 
ciples of Model Building [1972], offers three different kinds of classifications of 
models; when enumerating the models that he suggests be put into the different 
classes, he no longer uses the term model but talks of “problems in this area” and 
“the theory of this area” as a not-too-well-defined entity of knowledge. Ackoff 
suggests as a model of decision making a six-phases process that is supposed to 
be a good picture (model) of the real decision-making process [Ackoff 1962]. 
This is only one example of quite a number of very similar models of decision 
making. 

If we consider the size of some of the models used in operations research, 
containing more than 10,000 variables and thousands of constraints, we can 
easily see what does not distinguish a theory from a model: It is not the com- 
plexity, it is not the size, it is not the language, and it is not even the purpose. In 
fact, there seems to be only a gradual distinction between theory and model. 
While a theory normally denotes an entire area or type of problem, it is more 
comprehensive but less specific than a model (e.g., decision theory, inventory 
theory, queueing theory, etc.); a model most often refers to a specific context or 
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situation and is meant to be a mapping of a problem, a system, or a process. In 
contrast to a scientific theory, containing scientific laws as hypotheses, a model 
normally does not assert invariance with respect to time and space but requires 
modifications whenever the specific context for which the model was constructed 
changes. 

In the following, we will concentrate on models rather than on theories. Real- 
izing that there is quite a variety of types of models, we do not think that it is 
important and necessary for our purposes to distinguish models by their language 
(mathematics or logic is considered to be a modeling language), by area, by 
problem type, by size, and so on. One classification, however, seems to be impor- 
tant: the distinction of models by their character. Scientific theories were already 
divided into formal theories and factual theories. For models, particularly in the 
area of the technology in which values and preferences enter our considerations, 
we will have to distinguish among the following: 


1. Formal models. These are models that are purely axiomatic systems from 
which we can derive if—then statements and the hypotheses of which are 
purely fictitious. These models can only be checked for consistency; they can 
neither be verified nor falsified by empirical arguments. 

2. Factual models. These models include in their basic hypotheses falsifiable 
assumptions about the object system; that is, conclusions drawn from these 
models have a bearing on reality and they, or their basic hypotheses, have to 
be verified or can be falsified by empirical evidence. 

3. Prescriptive models. These are models that postulate rules according to which 
processes have to be performed or people have to behave. This type of model 
will not be found in science, but it is a common type of model in practice. 


The distinction between these three different kinds of models is particularly 
important when using them: All three kinds of models can look exactly the same, 
but the “value” of their outputs is quite different. It is therefore rather dangerous 
not to realize which type of model is being used, because we might take a formal 
model to be a factual model or a prescriptive model to be a factual model, and 
this could have quite severe consequences for the resulting decision. 

As an example, let us look at the above-mentioned Ackoff model of decision 
making. Is it a formal, a factual, or a prescriptive model? If it is a formal model, 
we cannot derive from it any conclusion for real decision making. If it is a factual 
model, then it would have to be verified or falsified before we can take it as a 
description of real decision making. The assertion, however, that decision making 
proceeds in phases was already empirically falsified in 1966 [Witte 1968]. Still, 
a number of authors stick to this type of model. Do they want to interpret their 
model as a prescriptive model? This would only be justified if they could show 
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that, for instance, decision making can be performed more efficiently when done 
in phases. This, however, has never been shown empirically. Therefore, we can 
only conclude that authors suggesting a multiphase scheme as a model for deci- 
sion making take their suggestion as a formal model and do not want to make 
any statement about reality, or that they are using a falsified, that is, invalid and 
false, factual model. 


16.1.2 Testing Factual Models 


The quality of a model depends on the properties of the model and the functions 
for which the model is designed. In general, models will have to have at least the 
following three major properties: logical consistency, usefulness, and efficiency. 
By logical consistency, we mean that all operations and transformations have 
been performed properly and that all conclusions follow from the hypothesis. This 
consistency has to be demanded of all types of models, whether they are formal, 
factual, or prescriptive. By usefulness, we mean that the model has to be helpful 
for the function for which it has been designed. By efficiency, we mean that the 
model, as the tool to achieve an end, has to fulfill the desired function at a 
minimum of effort, time, and cost. 

In decision making and problem solving, factual models will be needed to 
describe, to explain, and to predict phenomena and consequences. For “condi- 
tional predictions,” formal models will also be useful in order to obtain if—then 
statements, for instance, in the framework of simulation. Formal models will 
also be useful and necessary for the area of communication within the decision- 
making process and for relaying the resolutions or conclusions of the decision 
or problem-solving process to the “actors.” One should assume that prescriptive 
models are the most common in decision making. This, however, is only true 
if one calls all “decision models”—that is, models that contain an objective 
function by which an optimal solution can be determined—prescriptive models. 
To our mind, this is not quite appropriate because these kinds of models only 
prepare suggestions for possible decisions; the normative or prescriptive char- 
acter is acquired only after the “solution” has been declared a decision by 
the authorized decision maker. A much more important feature of these models, 
it seems to us, is that they have to describe or define properly the conditions 
that limit the action space (such as capacities, financial resources, legal restric- 
tions, etc.). 

We can now restate the notion of the quality of a model more precisely: we 
already mentioned that consistency is one of the necessary conditions for quality. 
Usefulness of a model will have to be defined for each of the three different types 
of models differently: 
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1. While a factual model can be called useful, if it is “factually true” (by con- 
trast to logically true), that is, if it maps the object system with an appropri- 
ate precision (which can only be tested empirically), the model also has to 
generate knowledge—that is, the user of a model should gain knowledge he 
or she would not have gained without using the model or which he or she 
did not have available before using the model. 

2. Formal models can be neither verified nor falsified empirically. Such a model 
will be considered useful if activities such as teaching, explaining, and com- 
munication become more efficient with the model than without it. 

3. Prescriptive models also cannot be verified or falsified. They are the more 
useful the more effectively they help to enforce the desired behavior, to 
control predefined performance measured, and to define ranges within which 
decision makers have freedom to decide. 


Two prime factors in modeling are the modeling language and the quality of 
input data. The type of modeling language appropriate for models in decision 
making was already discussed in chapter 1. Here we shall elaborate some more 
on the quality of input data. 

The saying “garbage in—garbage out” is well known and speaks for itself. 
The following quotation from Josiah Stamp [1975, p. 236] points in the same 
direction: “Governments are very keen in amassing statistics. They collect them, 
add them, raise them to the nth power, take the cube root and make wonderful 
diagrams. But what you must never forget is that every one of these figures comes 
in the first instance from the village watchman who just puts down what he damn 
pleases.” 

It must, however, be borne in mind that the effort put into deriving and obtain- 
ing numerical values or relations must be geared to the value of the model, and 
that when data are scarce it may still be useful to draw conclusions from not 
fully satisfactory input data. In this case, a tentative look at the dependence of 
the solution upon the quality of the input data may be very advisable. 

The quality of the input data is closely related to the question of operational 
definitions for the relevant variables. The processes of defining variables and their 
operational indicators and measurement are intertwined. To quote White [1975, 
p. 102], “We take ‘measurement’ to be a special aspect of a ‘definition’.”’ One 
might take the view that measurement is the actual procedure for assigning the 
real numbers that constitute the measure. However, as pointed out in a previous 
section, this is the quantification process and in itself does not constitute a 
measure unless it is a homomorphism. The homomorphism then defines the 
measure. Very often when modeling in the area of social sciences, one will find 
that relations, data, or values are stated in very vague ways. Goals, for instance, 
may be stated as “trying to achieve satisfactory profits,” data as “the South of the 
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Table 16-1. Hierarchy of scale levels. 


Permissible transformation 


Type of scale Verbal Formal Invariance Example 
Nominal scale One-to-one XEXE Uniqueness License 
function of values plates 
Ordinal scale Monotonic xi Sx — xS x; Rank orderof Marks 
increasing values 
function 
Interval scale Affine x=ax+b Ratio of Temperature 
function differences (C°, F°) 
Ratio scale Similarity x =a-x Ratio of Length 
function values (cm, inch) 
Absolute Identity x =x Values Frequency 


country is much poorer than the North,” and relations as “his investment strate- 
gies were much more risky than those of his competitors.” Very often these vari- 
ables are measured subjectively, and point scales are used to transform the 
“measurements” into numerical values. Even though it is necessary to include in 
the model variables that are considered important but that are very hard to oper- 
ationalize and measure, the quality of the input data might have very limiting 
effects on the degree of transformation of these variables that can be permitted 
in the model. Rather than neglecting these kinds of data, one should consciously 
determine which scale quality these data have and then make sure that only 
admissible transformations are being used when processing these data in the 
model. Table 16-1 sketches the hierarchy of scale levels including the permis- 
sible transformations for each of the levels. 

The testability of the components of a model—in the scientific and in the prac- 
tical context—depends largely on the operational definition of the hypotheses. In 
this sense, observation and formal analysis prior to model building can very often 
improve the testability of hypotheses. Let us illustrate this with the following 
example. In decision analysis, one normally distinguishes among decision 
making under certainty, decision making under risk, and decision making under 
uncertainty. One assumes that in decision making under risk the decision maker 
is able to store and process probability distribution functions. Here probabilities 
ought to be interpreted as Koopman-type probabilities—that is, probabilities as 
expressions of belief rather than in the frequentistic sense. This hypothesis is 
hardly testable because a situation of decision making under risk is not homoge- 
nous with respect to the available information at all. An improvement in the 
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testability of hypotheses could be achieved if one would distinguish, for instance, 
among the following: 


Decision making when quantitative probabilities are known (intervalscale) 

Decisions when interval probabilities are known (hyperordinal scale) 

Decisions when qualitative probabilities are known (ordinal scale) 

Decisions when partially ordered nominal probabilities are known (ordinal 

scale) 

5. Decisions when nominal probabilities are known (states are known but not 
truth ratable) 

6. Decisions when only some of the nominal probabilities are known 


APOUND 


It is obvious that the information storage and processing requirements that a 
human would need in order to decide “rationally” are quite different in the above 
cases and that the permissible operations in the model will also be different 
depending on the type of probability that can be assumed to exist. 

If the testing is done on the basis of the outputs of the analysis, the decision 
maker might already be able to indicate that the output of the analysis is not 
satisfactory, probably because important relations or variables have been omitted. 
If the decision maker or expert rates the output of the model as satisfactory, it 
gains the status of face-validity, sometimes in practice the most we can hope for. 

Ideally a model should now be tested by implementation, that is, by compar- 
ing actual with predicted results. This, however, in many instances is impossible 
for several reasons. 


1. Changes of environment: Factors such as sales, price levels, and so on might 
have changed while the model was built and implemented, and therefore the 
observed results after implementation of the model can no longer be com- 
pared with the predicted results. 

2. Changes in performance: If, for instance, the model is tested after imple- 
mentation by running the old procedure parallel to the model and if the old 
procedure included human activities, the performance of these activities 
might be improved by the persons because they know that the “new” model 
is being compared with their performance, which would probably drop again, 
if and when the operation of the new procedures would be terminated. 

3. Risk and uncertainty: It is obvious that if procedures have been designed to 
optimally decide in situations of risk or uncertainty, the “real” results cannot 
meaningfully be compared with the probabilistic prediction. 

4. Optimality: If only one solution is actually implemented, there is, of course, 
no way to compare this with other alternatives. In many cases, the optimal 
solution with which the model solution could be compared is not known at 
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all because it is not computable or because optimality was defined subjec- 
tively in a way that is not objectively reproducible. 


It has already been pointed out that all kinds of theories and models can be 
and ought to be tested for consistency. In formal analysis, it might even be pos- 
sible to prove consistency, which does not mean that models and theories for 
which consistency has not yet been proven are not formally correct. For “factual” 
or “substantial” theories and models, empirical testing of basic hypotheses, rela- 
tions, and resulting outputs is absolutely necessary in order to achieve a certain 
degree of confirmation of the theory or the model. This fact is often neglected 
when working with theories and models. If, for instance, the hypothesis of “‘ratio- 
nality” in decision-making models is “justified” by defining rationality by more 
basic axioms such as transitivity, reflexibility, existence of an ordering, and so 
on, which seem quite plausible and natural, then the model or the theory might 
become more testable but certainly not better confirmed. To confirm the model 
would require empirically testing either the main hypothesis or the presumably 
more operational basic axioms. This, of course, still does not determine uniquely 
the methods that can be used for testing hypotheses. These methods will depend 
on the area in which the model is being used (physics, engineering, management) 
and the purposes for which the model has been built. Thus, in scientific inquiry, 
probabilistic tests might not be acceptable because scientific laws assert deter- 
ministic invariance. These methods, however, might be the only available ones 
for testing models in areas such as management, sociology, and political decision 
making. 

In the following we shall report on empirical research concerning two main 
components of fuzzy set theory: Membership functions and operators (connec- 
tives, aggregators). 


16.2 Empirical Research on Membership Functions 


Measurement means assigning numbers to objects such that certain relations 
between numbers reflect analogous relations between objects. In other words, 
measurement is the mapping of object relations into numerical relations of the 
same type. 

If it is possible to prove that there is a homomorphic mapping f. E > N from 
an empirical relational structure (E, P4, . . . , P,) with a set of objects E and an n- 
tuple of relations P; into a numerical relational structure (N, Q,,..., Q, with a 
set of numbers N and relations Q,, then a scale ÇE, N, fY exists. By specifying 
the admissible transformations, the grade of uniqueness is determined. 

Therefore measurement starts by formulating the properties of the empirical 
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structure; implicitly, the intended object space is modeled on a nonnumerical 
level. Strictly speaking, at the very beginning there should be a semantic defini- 
tion of the central concepts; this would considerably facilitate the consistent use 
of the relevant principles. Unfortunately, this definition has not yet been possible 
for the concept of membership. Membership has a clear-cut formal definition. 
However, explicit requirements for its empirical/experimental measurement are 
still missing. Under these circumstances, it is not surprising that apart from first 
steps by Norwich and Turksen [1981], genuine measurement structures have not 
yet been developed. 

Under these circumstances, one could wait and see, until a satisfactory defin- 
ition is available. However, one should remember that up to the beginning of the 
twentieth century, even in the “hard sciences,” measures were used without being 
equipped with adequate measurement theories. Usually the measurement tools 
used were based on not much more than plausible reasons. Nevertheless, the 
success of the natural sciences is undisputed. Hence, for the purpose of empiri- 
cal research, it may be tolerable to use plausible techniques. 

Firstly, such a scale can serve as an operational definition of membership. Sec- 
ondly, a specific concept can be criticized and hence may help to obtain useful 
improvements. We shall present two models for membership functions. Let us 
call the first “Type A-model” and the second “Type B-model.” 


16.2.1 Type A-Membership Model 


Of prime importance is the determination of the lowest necessary scale level of 
membership for a specific application. The purpose of the model A-membership 
was to empirically investigate aggregation operators. In this instance, it was suf- 
ficient to determine degrees of membership for a predefined set of objects rather 
than continuous membership functions. The requested scale level should be as 
low as possible in order to facilitate data acquisition, which usually involves the 
participation of human beings. On the other hand, a suitable numerical handling 
is desirable in order to insure mathematically appropriate operating. Regarding 
the five classical scale types—nominal, ordinal, interval, ratio, and absolute 
scale—the interval scale level seems to be most adequate. In this respect, we 
cannot follow Sticha, Weiss, and Donnell [1979], who assert that membership 
has to be measured on an ordinal scale. Usually the intended mathematical oper- 
ations require at least interval-scale quality. 

The easiest way to obtain data is to ask some subjects directly for member- 
ship values. However, it is well known that scales that are developed by using 
the so-called direct methods may be distorted by a number of response biases 
[Cronbach 1950]. On the other hand, indirect methods work on the basis of much 
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Table 16-2. Empirically determined grades of 


membership. 

Stimulus x L(x) L(x) Umnclx) 
1. bag 0.000 0.985 0.007 
2. baking tin 0.908 0.419 0.517 
3. ballpoint pen 0.215 0.149 0.170 
4. bathtub 0.552 0.804 0.674 
5. book wrapper 0.023 0.454 0.007 
6. car 0.501 0.437 0.493 
7. cash register 0.692 0.400 0.537 
8. container 0.847 1.000 1.000 
9. fridge 0.424 0.623 0.460 

10. Hollywood swing 0.318 0.212 0.142 

11. kerosene lamp 0.481 0.310 0.401 

12. nail 1.000 0.000 0.000 

13. parkometer 0.663 0.335 0.437 

14. pram 0.283 0.448 0.239 

15. press 0.130 0.512 0.101 

16. shovel 0.325 0.239 0.301 

17. silver spoon 0.969 0.256 0.330 

18. sledgehammer 0.480 0.012 0.023 

19. water bottle 0.564 0.961 0.714 

20. wine barrel 0.127 0.980 0.185 


weaker assumptions using ordinal judgments only. Their advantages are sim- 
plicity and robustness with respect to response biases. 

Their disadvantage is that many judgments are needed, since the ordinal judg- 
ment provides relatively little information. This drawback seemed acceptable in 
order to avoid distortions of the data. Thus we decided to use a method that yields 
an interval scale on the basis of ordinal ratings: After a set of suitable objects has 
been established, subjects are asked for the grades of membership on a percent- 
age scale. People are accustomed to this type of judgment, and division by 100 
provides the normalized 0-1 values. The obtained data are interpreted as ranks. 
The subsequent scaling procedure refers mainly to a method suggested by 
Diederich, Messick, and Tucker [1957] based on Thurstone’s “Law of Categori- 
cal Judgment” [Thurstone 1927]. 

A detailed description of the method can be found in Thole, Zimmermann, and 
Zysno [1979]. Table 16-2 illustrates the type of membership information that was 
obtained and the type of objects used for experimentation. The transformation of 
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the observed information to degrees of membership was performed by a com- 
puter program written for this purpose. 


16.2.2 Type B-Membership Model 


Often a certain concept can be considered as a context-specific version of a more 
general feature. For instance, the set of young men is a subset of all objects with 
the feature age. We shall call this general feature the “base variable.” This coin- 
cides with the definition of a base variable in definition 9-1. The scale of the base 
variable that is normally generally accepted (here age in years) will be called a 
“judgmental scale.” In contrast to the scale of the base variable, the scale of the 
“specific version” is context-dependent. Thus a term in definition 9-1 does not 
necessarily correspond to “the specific version” of the base variable, because 
“terms” did not explicitly assume a specific context. If the term young refers to 
the age of men (by contrast to the age of flies, cars, houses, or dinosaurs), then 
we can assume that the observer has some idea about what “young” means with 
respect to men. He has a “standard” with respect to which he evaluates age in 
terms of “young,” “old,” etc. We shall, therefore, call this specific context- 
dependent scale an “evaluational scale.” If there exist a judgmental scale and an 
evaluational scale, both referring to the same empirical relational structure, then 
a mapping from one numerical relative into the other that reflects the differences 
of the basic empirical relational structure with respect of the same set of elements 
would be possible. If, on the other hand, the scale of (for instance) the base vari- 
able and the mapping (function) was known, then the scale of the special feature 
could be determined. The mapping (function) can be considered as the member- 
ship function, which has to be determined. The required scale level of the mem- 
bership function essentially remains the same as for type A model. In contrast to 
model A, however, we used direct scaling methods. These involve less effort and 
are justified by the existence of the base variable, which provides extra control 
with respect to judgmental errors of the subjects. The judgmental (valuation) of 
membership can be regarded as the comparison of object x with a standard (ideal), 
which results in a distance d(x). If the object corresponds fully with the standard, 
the distance shall be zero; if no similarity between standard and object exists, the 
distance shall be “co.” If the evaluation concept is represented formally by a fuzzy 
set Pc X, then a certain degree of membership [,(x) is assigned to each element 
x. We shall assume that this degree of membership is a function of the “distance,” 
d, between the two above-mentioned scales (P representing a fuzzy set defined 
context-dependently as a subset of the universe X). 
Thus we define 
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1 
1+ d(x) 





p(x) = (16.1) 
where d(x) is the “distance” of the two scales for the element x e X. The distance 
function now has to be specified. A specific monotonic function of the similarity 
with the ideal could, as a first approximation, be d’(x) = 1/x. 

Experience shows, however, that ideals are very rarely fully realized. As an 
aid to determine the relative position, very often a context-dependent standard b 
is created. It facilitates a fast and rough preevaluation such as “rather positive,” 
“rather negative,” and so on. As another context-dependent parameter, we can 
use the evaluation unit a, similar to a unit of length such as feet, meters, yards, 
and so on. If one realizes furthermore that the relationship between a physical 
unit and perceptions is generally exponential [Helson 1964], then the following 
distance function seems appropriate: 


d(x) =——~ 16.2 
(x) en (16.2) 
Substituting equation (16.2) into model (16.1) yields the logistic function 
E 1 
Het) = (16.3) 


It is S-shaped, as demanded by several authors [Goguen 1969; Zadeh 1971]. 
Formally, b is the inflexion point and a is the slope of the function. 

From the point of view of linear programming, model (16.3) has the additional 
advantage that it can easily be linearized by the following transformation: 


-mE = m7 sab) (16.4) 


where u stands for p(x). 

The parameters a and b will have to be interpreted differently depending 
on the situation that is modeled. From a linguistic point of view, a and b can be 
considered as semantic parameters. 

Model (16.3) is still too general to fit subjective models of different persons. 
Frequently only a certain part of the logistic function is needed to represent a 
perceived situation. This is also true for measuring devices such as scales, 
thermometers, and so on, which are designed for specific measuring intervals 
only. 

In order to allow for such a calibration of our model, we assume that only a 
certain interval of the physical scale is mapped into the open interval (0, 1) (see 
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Figure 16-1. Calibration of the interval for measurement. 


figure 16-1). Whenever stimuli are smaller than or equal to the lower bound or 
larger than or equal to the upper bound, the grade of membership of 0 or 1, respec- 
tively, is assigned to them. This is achieved by changing the range by legitimate 
scale transformations such that the desired interval is mapped into [0, 1]. 

Since we requested an interval scale, the interval of the degrees of member- 
ship may be transformed linearly. On this scale level, the ratios of two distances 
are invariant. Let u and u, respectively, be the upper and lower bounds of the 
normalized membership scale, let 1; be a degree of membership between these 
bounds, u < u; < H, and let W, u$, H’ be the corresponding values on the trans- 
formed scale. Then 7 
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Pre Pe (16.5) 
H-U u-u 
For the normalized membership function, we have 4 = 0 and U = 1. 
Hence 
ui =w -w)R (16.6) 


Generally it is preferable to define the range of validity by specifying the inter- 
val d with the center c as shown in figure 16-1. 
Hence 


u =d+ (16.7) 
and 


w =2c-p’ (16.8) 


Substituting equation (16.7) into equation (16.8) yields 
w = 2c-d-W (16.9) 
Solving equation (16.9) for u’ gives 
w =c—d/2 (16.10) 


and inserting equations (16.10) and (16.7) into equation (16.6) yields 
u’ =d(u; -1/2)+c (16.11) 


The general model of membership (16.3) 1s specified by two parameters of cali- 
bration, if u; is replaced by u;. Solving this equality for u; leads to the complete 
model of membership: 


n= (e) 
© [eed d 2 (16.12) 


01. . . . . 
[.] indicates that values outside of the interval [0, 1] have no real meaning. The 
measurement instrument does not differentiate there. Hence 


x<x—> U(x) =0 
x>X¥ 3 u(x)=1 (16.13) 


The determination of the parameters from empirical databases does not pose any 
difficulties in the general model (16.3). It should be mentioned that not only 
monotonic functions, such as those discussed so far, can be described, but so can 
unimodal functions—by representing them by an increasing (S,) and a decreas- 
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Figure 16-2. Subject 34, “Old Man.” 


ing (Sp) part. Formally, they can be represented as the minimum or maximum, 
respectively, of two monotonic membership functions each: 


Hss, (x) = min fus (x), us, (x) 
Hss, (x) = max Tus (x), ps, œ] 


A computer program was written to process the observed data. 

The type B-model for membership functions, which provides a membership 
function rather than degrees of membership for single elements of a fuzzy set (as 
Type A does), was also empirically tested. 

We shall present results concerning a very common fuzzy set, “young men,” 
“old men,” and so on. Having available membership functions, we could also test 
models of modifiers such as “very.” 

The evaluation of the data showed a good fit of the model. Figures 16-2 
through 16-7 show the membership functions given by six different persons. As 
can be seen, the concepts “very young men” and “young men” are realized in the 
monotonic type as well as in the unimodal. The detailed data and results can be 
found in a major report of the authors [Zimmermann and Zysno 1982]. 

One may ask whether a general membership function for each of the four sets 
can be established. Though the variety of conceptual comprehension is rather 
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Figure 16-3. Subject 58, “Very Old Man.” 
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Figure 16-4. Subject 5, “Very Young Man.” 
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Figure 16-5. Subject 15, “Very Young Man.” 
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Figure 16-6. Subject 17, “Young Man.” 
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Figure 16-7. Subject 32, “Young Man.” 


remarkable, there should be an overall membership function at least in order to 
have a standard of comparison for the individuals. This is achieved by deter- 
mining the common parameter values a, b, c, and d for each set. Obviously, the 
general membership functions of “old man” and “very old man” are rather similar 
(see figures 16-8 and 16-9). Practically, they differ only with respect to their 
inflection points, indicating a difference of about five years between “old man” 
and “very old man.” The same holds for the monotonic type of “very young man” 
and “young man’; their inflection points differ by nearly 15 years. It is interest- 
ing to note that the modifier “very” has a greater effect on “young” than on “old,” 
but in both cases it can be formally represented by a constant. Several subjects 
provided the unimodal type in connection with “very young” and “young.” Again 
the functions show a striking congruency. 


16.3 Empirical Research on Aggregators 


In section 3.2.2, a number of possible operators were mentioned. We saw that 
they were assigned in various ways to set-theoretic operations, such as inter- 
section, union, etc. For some of these operators, axiomatic formal justifications 
were also given. In definition 14-1, the triple decision-intersection-min-operator 
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Figure 16-8. Empirical membership functions “Very Young Man,” “Young Man,” 
“Old Man,” “Very Old Man.” 
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Figure 16-9. Empirical unimodal membership functions “Very Young Man”, 
“Young Man.” 


was used. Some indication was given there that, from a factual point of view, this 
triple might turn out not to be true. After what has been said in section 16.1, it 
should be obvious that for a factual use of fuzzy set models only empirical ver- 
ification of models for the aggregators is appropriate. This can only be done in 
specific contexts, and the results will therefore be of limited validity. 
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Some empirical testing of aggregators has been performed in the context of 
fuzzy control. We shall report on empirical research done in the context of human 
evaluation and decision making, that is, concerning the question, “How do human 
beings aggregate subjective categories, and which mathematical models describe 
this procedure adequately?” 

As already mentioned, the term decision has been defined in many different 
ways. A decision also has many different aspects, for example, the logical aspect, 
the information-processing aspect, etc. We shall focus our attention on the last 
aspect: The search for and the modeling, processing, and aggregation of infor- 
mation. A decision in the sense of definition 14-1, rather than being some kind 
of optimization, is the search for an action that satisfies all constraints and all 
aspiration levels representing goals. “Deciding” about the creditworthiness of a 
person might be called an “evaluation” rather than a decision. It means, however, 
checking on whether a person satisfies all aspiration levels concerning security, 
liquidity, business behavior, and so on. 

In the following, we will give a rough description of two experiments and their 
results. The first experimental design started from the triple “decision- 
intersection-min-operator” and tried to find out whether the min-operator was 
adequate for modeling the intersection. However, it did not question the pair 
“decision-intersection.” The second experiment is no longer limited to consider- 
ing a decision as the intersection; it relinquishes the set-theoretic interpretation 
of a decision altogether. 


Test 1: Intersection-min-operator [Thole et al. 1979] 


Two fuzzy sets, A and B, were considered. It seems reasonable to demand that 
the following conditions concerning the judgmental “material” are satisfied: 


1. The attributes characterizing the members of the sets A and B are indepen- 
dent, that is, some magnitude of u; is not affected by some magnitude of Ug 
and vice versa. As an operational criterion for this kind of independence, a 
correlation of zero is demanded: 


Ty jue T 0 


2. If Using represents the aggregation of u; and Ug, modeling the intersection, 
and if wz and wg are weights, then Ugg can be described by 


Wang =(wala)o(wala) 


Where ° stands for some algebraic operation. But since the models proposed 
do not take into account the different importance of the sets with respect to 
their intersection, equal weights are demanded: 
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WA — WB 


As an operational criterion for equal weights, equal correlations are 
demanded: 


Twauans — waeans 


With regard to these conditions, three fuzzy sets were chosen: “metallic 
object” [Metallgegenstand], “container” [Behälter], and “metallic container” 
[Metall-behdilter].' It has to be proved that these sets satisfy the conditions 
mentioned above. 

Now the following hypotheses may be formulated: Let g(x) be the grade of 
membership of some object x in the set “metallic object” and u (x) be the grade 
of membership of x in the set “container”; then the grade of membership of x in 
the intersection set “metallic container” can be predicted by 


Ay: Wane (x) = min {pa (x), ue) 
Ay: Wane (x) =Wa (x) - Wee) 


A pretest was carried out in order to guarantee that these assumptions were 
justified. 

Sixty students at the RWTH Aachen from 21 to 33 years of age, all of them 
native speakers of the German language, served as unpaid subjects in the main 
experiment. Each subject was run individually through two experimental ses- 
sions, the first one taking about 20 minutes, the second one about 40 minutes. In 
order to eliminate influences of memory as much as possible, the interviews were 
performed at an interval of approximately three days. 

Each subject was asked to evaluate each of the objects with respect to being 
a member of A (metallic object), B (container), and A N B (metallic container). 
The three resulting membership scales are shown in table 16-2. 

Now, what about the prediction of the empirical data for “metallic container” 
by the two candidate rules? Table 16-3 shows the empirical results together with 
the grades of membership computed by using the min-operator and the product- 
operator, respectively. 

Figures 16-10 and 16-11 show graphically the relationship between empiri- 
cal and theoretical grades of membership. The straight line indicates locations of 
perfect prediction—that is, if the operator makes perfect predictions and the data 
are free of error, then all points lie on the straight line. 


' This investigation has been carried out in Germany. The corresponding German word is given 
in brackets. It should be realized that the German language allows the forming of compound word; 
hence the intersection is labeled by one word. 
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Table 16-3. Empirical vs. predicted grades of membership. 


Stimulus x Hünë(x) Hüné(x) | min Hünc(x) | prod. 
1. bag 0.007 0.000 0.000 
2. baking tin 0.517 0.419 0.380 
3. ballpoint pen 0.170 0.149 0.032 
4. bathtub 0.674 0.552 0.444 
5. book wrapper 0.007 0.023 0.010 
6. car 0.493 0.437 0.219 
7. cash register 0.537 0.400 0.252 
8. container 1.000 0.847 0.847 
9. fridge 0.460 0.424 0.264 

10. Hollywood swing 0.142 0.212 0.067 

11. kerosene lamp 0.401 0.310 0.149 

12. nail 0.000 0.000 0.000 

13. parkometer 0.437 0.335 0.222 

14. pram 0.239 0.283 0.127 

15. press 0.101 0.130 0.067 

16. shovel 0.301 0.293 0.078 

17. silver spoon 0.330 0.256 0.248 

18. sledgehammer 0.023 0.012 0.006 

19. water bottle 0.714 0.546 0.525 

20. wine barrel 0.185 0.127 0.124 


The question arises: Are the observed deviations small enough to be tolera- 
ble? To answer this question we chose two criteria: 


1. if the mean difference between observed and predicted values is not differ- 
ent from zero (a = 0.25; two-tailed), and 

2. if the correlation between observed and predicted values is higher than 0.95, 
the connective operator in question should be accepted. 


Since the observed differences are normally distributed, we used the student 
t = test as a statistic. It is entered by the mean of the population (in this case, 0), 
the mean of the sample (0.052 for the min-operator and 0.134 for the product- 
operator), the observed standard deviation (0.067 for the minimum and 0.096 
for the product), and the sample size (20). For the min-rules, the result is 
t = 3.471, which is significant (df = 19; p, the probability of transition, is less 
than 0.01). For the product rule, the result is t = 6.242, which is also significant 
(df = 19; p is less than 0.001). Thus, both hypotheses H, and H, have to be 
rejected. 

Despite the fact that none of the connective operators tested seems to be a 
really suitable model for the intersection of subjective categories, there is a slight 
superiority of the min-rule, as can be seen from the figures. If one were forced 
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Figure 16-10. Min-operator: Observed vs. expected grades of membership. 


to use one of these aggregation rules, then the minimum certainly would be the 
better choice. 

The results of this experiment indicate that both product and minimum fail to 
be perfect models for the intersection operation in human categorizing processes. 


Test 2 [Zimmermann and Zysno 1980] 


The interpretation of a decision as the intersection of fuzzy sets implies no pos- 
itive compensation (trade-off) between the degrees of membership of the fuzzy 
sets in question if either the minimum or the product is used as an operator. Each 
of them yields degrees of membership of the resulting fuzzy set (decision) that 
are on or below the lowest degree of membership of all intersecting fuzzy sets 
(see test). 

The interpretation of a decision as the union of fuzzy sets, using the max- 
operator, leads to the maximum degree of membership achieved by any of the 
fuzzy sets representing objectives or constraints. This amounts to a full compen- 
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Figure 16—11. Product-operator: Observed vs. expected grades of membership. 


sation of lower degrees of membership by the maximum degree of membership 
(see example 14—4). 

Observing managerial decisions, one finds that there are hardly any decisions 
with no compensation between different degrees of goal achievement or between 
the degrees to which restrictions are limiting the scope of decisions. The com- 
pensation, however, rarely seems to be “complete,” as would be assumed using 
the max-operator. It may be argued that compensatory tendencies in human aggre- 
gation are responsible for the failure of some classical operators (min, product, 
max) in empirical investigations. 

Two conclusions can probably be drawn: Neither the noncompensatory “and” 
represented by operators that map between zero and the minimum degree of mem- 
bership (min-operator, product-operator, Hamacher’s conjunction operator (see 
definition 3—15), Yager’s conjunction operator (see definition 3—16) nor the fuzzy 
compensatory “or” represented by operators that map between the maximum 
degree of membership and 1 (maximum, algebraic sum, Hamacher’s disjunction 
operator, Yager’s disjunction operator) are appropriate to model the aggregation 
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of fuzzy sets representing managerial decisions. It is necessary to define new 
additional operators that imply some degree of compensation, that is, that map 
also between the minimum degree of membership and the maximum degree of 
membership of the aggregated sets. In contrast to modeling the non-compensatory 
“and” or the fully compensatory “or,” they should represent types of aggregation 
that we shall call “compensatory and.” 

It is possible that human beings use many nonverbal connectives in their think- 
ing and reasoning. One type of these connectives may be called “merging con- 
nectives,’ which may be represented by the “compensatory and.” Being forced 
to verbalize them, people may possibly map the set of “merging connectives” 
into the set of the corresponding language connectives (“and,” “or”). Hence, when 
talking, they use the verbal connective they feel to be closest to their “real” non- 
verbal connective. 

In analogy to the verbal connectives, the logicians defined the connectives ^ 
and v, assigning certain properties to each of them. By this, compound sentences 
can be examined for their truth values. In contrast to this constructive process, 
the empirical researcher has to analyze a given structure. Therefore, in order to 
induce subjects to use their own connectives, we avoided the verbal connectives 
“and” and “or” in our experiment, but tried to ask for combined membership 
values implicitly presenting a suitable experimental design and instruction, 
respectively. 

We shall not describe in detail the experimental work in which different com- 
pensatory operators were tested and in which the y-operator (see definition 3—19) 
turned out to perform best. The reader is referred to Zimmermann and Zysno 
[1980] for details. We shall return to figure 1—1 and explain how credit clerks 
arrive at a decision concerning the creditworthiness of customers by aggregating 
their judgments concerning the determinants of creditworthiness. For details, 
see Zimmermann and Zysno [1983]. A number of possible compensatory and 
noncompensatory models were tested. 

Searching for an appropriate decision situation, our choice fell on the rating 
of creditworthiness for the following reasons: 


1. This is a decision problem that is complex enough though it is still relatively 
transparent and definable. In addition, this situation is highly standardized. 
Even though test subjects come from different organizations, similar evalu- 
ation schemes can be assessed. 

2. A sufficiently large number of decision makers is available with about the 
same training background and similar levels of competence. 

3. The decision problem to be solved can be formulated and presented in a 
realistic manner with respect to contents and appearance. 
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First, the creditworthiness hierarchy shown in figure 1—1 was developed together 
with 18 credit clerks. 

Testing the predictive quality of the proposed models required a suitable basis 
of stimuli that were to be rated with respect to the creditworthiness criteria and 
a weighting system that allowed a differentiated aggregation of these criteria. 

The natural basis of information for evaluating creditworthiness is the credit 
file. Therefore, we would have liked to analyze original bank files. However, a 
selection of finished cases is always a biased sample, since the initially rejected 
applicants are missing. Moreover, we wanted to avoid unnecessary troubles with 
banking secrecy. Therefore, it was decided to prepare 50 fictitious applicants for 
credit. 

A credit application form usually contains about 30 continuous or discrete 
attributes of applicants. If each variable were dichotomized, 2% different bor- 
rowers could be produced. Clearly, one cannot realize all possible variations. 
Therefore, a sample was drawn that satisfied the following two conditions: The 
50 applicants (stimuli) should 


1. be distributed as evenly as possible along the continuum of each aspect, and 
2. be typical for consumer credits. 


The files were produced in three stages: 


1. One hundred and twenty applications were completed randomly with respect 
to the grade of extension of the 30 attributes. 

2. The resulting 30 x 120 data matrix was purged of 40 cases most unlikely and 
least typical. The remaining 80 files were completed using information of an 
inquiry agency (Schufa) and a short record of a conversation between the 
client concerned and a credit clerk. 

3. The applicants should represent the variability of the eight concepts. If each 
aspect is dichotomized into two classes (u = 0.5 — 0, u > 0.5 — 1), then 
the resulting 2° = 256 patterns of evaluation can be put in a 16 x 16 matrix. 
With the assistance of two credit experts, the 80 credit files were placed into 
this tableau. Finally, 30 files were eliminated in order to obtain equal fre- 
quencies in rows and columns. 


We could now expect that the 50 applicants varied evenly along each attribute 
and each criterion. Only one attribute was constant: the credit amount was fixed 
at DM 8,000, because the judgment “creditworthy” is only meaningful with 
respect to a certain amount. A borrower might be good for DM 8,000, but not for 
DM 15,000. 
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Figure 16—12. Predicted vs. observed data: Min-operator. 


Surely it would be interesting to include the credit amount as a variable in this 
investigation. But in order to receive a stable basis for scaling and interpretation, 
a serious enlargement of the sample of credit experts would be necessary. This, 
however, would have considerably exceeded our budget. 

The predictive quality of each model can be evaluated by comparing observed 
u-grades with theoretical u-grades. The latter can be computed for higher-level 
concepts by aggregation of the lower-level concepts using the candidate formula. 
The membership values for higher-level concepts should be predicted sufficiently 
well by any lower level of the corresponding branch. The quality of a model can 
be illustrated by a two-dimensional system, the axes of which represent the 
observed versus theoretical u-values. Each applicant is represented by a point. In 
the case of exact prognosis, all points must be located on a straight diagonal line. 
As our data are corrected empirically, there will be deviations from this ideal. 
Figures 16-12 to 16-15 depict some of the typical results of the tests for secu- 
rity as being determined by fourth-level determinants. 

Unfortunately, the weighted geometric mean fails drastically in predicting 
security by unmortgaged real estate and other net properties. In our view, this is 
due to the fact that the model does not regard different grades of compensation. 
The inclusion of different weights for the concepts does not seem to be sufficient 
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Figure 16—13. Predicted vs. observed data: Max-operator. 
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Figure 16—14. Predicted vs. observed data: Geometric mean operator. 
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Figure 16-15. Predicted vs. observed data: y-operator. 


for describing the human aggregation process adequately. Consequently, it comes 
as no surprise that the y-model, comprising different weights as well as different 
grades of compensation, yields the best results. 

It should be kept in mind, however, that y has not been determined empiri- 
cally. This would have required a further experimental study, based on a theory 
describing the dependence of y-values between higher and lower levels. For the 
present, we are content with estimations derived from the data. At least it has 
been shown that the judgmental behavior of credit clerks can be described quite 
well if this parameter is taken into account. 

Finally, the complete hierarchy of creditworthiness is presented together with 
the elaborated weighting system and the y values for each level of aggregation 
(figure 16-16). 


16.4 Conclusions 


Our example analysis of the process of rating creditworthiness yields a criteria 
structure that is concept oriented and self-explanatory. The y-model, which was 
from the beginning designed to satisfy mathematical requirements as well as to 
describe human aggregation behavior, proved most adequate with respect to prog- 
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Figure 16—16. Concept hierarchy of creditworthiness together with individual 
weights 6 and y-values for each level of aggregation. 


nostic power. This class of operators 1s continuous, monotonic, injective, com- 
mutative, and in accordance with classical truth tables, which manifests their 
relationship to formal logic and set theory. They aggregate partial judgments 
such that the formal result of the aggregation ought to make them attractive for 
empirically working scientists and useful for the practitioner. 

Banking managers not only evaluate but also decide. In order to complete the 
description of a decision process, we therefore had asked the managers to arrive 
at a decision for each fictitious credit application. If the creditworthiness were an 
attribute of the all-or-none type and all credit managers followed the same deci- 
sion-making process, then two homogenous blocks of credit decisions (one block 
with 100% yes decisions and one block with 100% no decisions) would result. 
The number of positive decisions, however, varied over the entire range from 45 
to 0. Obviously, there existed a considerable individual decision space. 


1 [ FUTURE PERSPECTIVES 


In the first nine chapters of this book, we covered the basic foundations of the 
theory of fuzzy sets as they can be considered today in an undisputed fashion. 
Many more concepts and theories could not be discussed, either because of space 
limitations, because they cannot yet be considered ready for a textbook, or they 
are too specific and advanced for the goal of this textbook. It was already men- 
tioned in the preface, that now-a-days more than 30,000 publications in the area 
of fuzzy set theory and computational intelligence exist. It is obvious that they 
cannot all be covered in such a textbook. I hope, however, that after studying this 
book the reader will be in a position to read, understand and evaluate most of the 
papers and books that are being published now. Hopefully the reader has also 
obtained some feeling how and to what type of problems this technology can be 
applied. 

Fuzzy set theory is certainly not a philosopher’s stone that solves all the prob- 
lems that confront us today. But it has considerable potential for practical as well 
as for mathematical applications, the latter of which have not been discussed at 
all in this book. 

To indicate the scope of future applications of fuzzy set theory, we shall point 
to some of the most relevant subject areas. Researchers have become more and 
more conscious that we should be less certain about uncertainty than we have 
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been in the past. The management of uncertainty—that is, uncertainty due to lack 
of knowledge or evidence, due to an abundance of complexity and information, 
or due to the fast and unpredictable development of scientific, political, social, 
and other structures nowadays—will be of growing importance in the future. 

In fact, in practice the “fuzzy epoch” has already begun. There already exist 
quite a number of expert systems and expert-system shells that use fuzzy sets 
either in the form of linguistic variables or in the inference process (see [Gupta 
and Yamakawa 1988a]). Fuzzy computes were exhibited as early as 1987 in 
Tokyo. Gupta and Yamakawa [1988a] provide a very good description of the 
present state of development. 

One of the advantages of fuzzy set theory is its extreme generality, which will 
enable it to accommodate quite a number of the new developments necessary for 
coping with existing and emerging problems and challenges. Some areas are 
already well developed, such as possibility theory [Dubois and Prade 1988a], 
fuzzy clustering, fuzzy control, fuzzy mathematical programming, etc. Other 
areas, however, have still ample space for further development. 

The area in which primarily fuzzy set theory is known and attractive to many 
scientists, students and practitioners was certainly fuzzy control. Excellent books, 
as, for instance, [Babuska 1988] and [Verbruggen et al. 1999] indicate extremely 
well the present state of this area. Unluckily the attractiveness of this area has to 
a large extent hidden the other potentials of fuzzy set theory. We hope that the 
reader of this book has become aware of all the other and not yet exploited pos- 
sibilities to use this theory in many areas. 

Considerably more research—formal as well as empirical—will be necessary 
in order to cope with these challenges. Much of this research will only be possi- 
ble through interdisciplinary team efforts. Let us indicate some of the research 
that is needed. Fuzzy set theory can be considered as a modeling language for 
vague and complex formal and factual structures. So far, mainly the min-max 
version of fuzzy set theory has been used and applied, even though many other 
connectives, concepts, and operations have been suggested in the literature. Mem- 
bership functions generally are supposed “to be given”. Therefore, much empir- 
ical research and good modeling effort is needed on connectives and on the 
measurement of membership functions to be able to use fuzzy set theory ade- 
quately as a modeling language. Great opportunities, not yet exploited, exist in 
the field of artificial intelligence. Most of the approaches and methods offered 
there so far have been dichotomous. If artificial intelligence really wants to be 
useful in capturing human thinking and perception, the phenomenon of uncer- 
tainty will have to be modeled much more adequately than has been done so far. 
Here, of course, fuzzy set theory offers many different opportunities. 

Very recently an even younger promising application area has emerged: that 
of web-technology. Large masses of data and information are being made avail- 
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able without improving the human capability of perceiving complex structures in 
detail. Intelligent agents, data mining, etc. might help to bridge this gap and fuzzy 
technology will undoubtedly find an almost yet untouched field of research and 
application. Also new areas, such as ecology, nuclear engineering, etc., have 
already shown to have large potentials for fuzzy sets. 

Another (at least potential) strength of fuzzy set theory is its algorithmic, com- 
putational promise. The more we realize that there are problems—the reader 
might, for instance, think of NP-complete problem structures, which are far too 
complex for existing traditional approaches (combinatorial programming, etc.) 
to cope with—the more the need for new computational avenues becomes 
apparent. 

In recent years fuzzy systems have been used to solve, for instance, efficiently 
systems of differential equations (see [Bardossy 1996]) and one also finds some 
other applications of that type in recent issues of fuzzy sets and systems. 

In general, however, fuzzy set theory has not yet proved to be computation- 
ally able to solve large and complex problems efficiently. Reasons for this are 
that for computation, either we still have to resort to traditional techniques (linear 
programming, branch and bound, traditional inference) or the additional infor- 
mation contained in fuzzy set models makes computations excessively volumi- 
nous. Here prudent standardization (support fuzzy logic, etc.) as well as good 
algorithmic combinations of heuristics and fuzzy set theory might offer some real 
promise. In other words, research in the direction of fuzzy algorithms is also 
urgently needed. 

Decision analysis has since 1970 been one of the prominent application areas 
of fuzzy set theory. In this comprehensive textbook only one chapter could be 
dedicated to this area. More details can be found in my book “Fuzzy Sets, Deci- 
sion Making and Expert Systems” [1987, third printing 1993] and other books 
and papers listed in the bibliography. It is hoped that further research efforts will 
advance this area and help to close still existing gaps. 


Abbreviations of Frequently Cited Journals 


ECECSR Economic Computation and Economic Cybernetics Studies and 
Research 

EIK Elektronische Informationsverarbeitung und Kybernetik 

FJOR European Journal of Operational Research 

FSS Fuzzy Sets and Systems 

JMAA Journal of Mathematics, Analysis and Applications 


J.Op.Res.Soc. Journal of the Operational Research Society 
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engineering applications, 373—389 
gear boxes, 381—389 
machine tools, 375—380 
entropy, 49-50 
Epistemic logic, 151 
equality property, 238 
ESP, 213-220 
expert systems, 7, 185-222 
applications, 203—220, 401—404 
characteristics, 187—188 
definition, 188 
frames, 190-192, 203 
job shop scheduling application, 401—404 
linguistic description application, 203—206 
medical diagnosis application, 206-210 
production rules, 190, 202 
semantic nets, 190 
strategic planning application, 213-220 
structural damage assessment application, 
210-213 
techniques, 189-192 
uncertainty modeling in, 193-203 
extended addition, 62 
extended division, 63—64 
extended product, 62-63 
extended subtraction, 63 
extension principle 
definition, 55—56 
LR-representation of fuzzy sets, 64-68 
operations, 61—64 
set-theoretic operations defined by, 56-59 
extreme value strategy, 233 


factual models, 448, 449-453 

falsity, 146, 149 

fast Fourier transformation (FFT), 317 
FE-count, 200-201 
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FG-count, 200 
filter methods, 317 
first projection, 74 
FL-count, 200 
flexible manufacturing systems (FMSs) control 
application, 405-411 
forbidden zones, 235 
forests, 85 
formal models, 448 
forward chaining, 191 
frames, 190-192, 203 
Fril, 169-182 
applications, 181-182 
inference for single rule, 175—176 
inference methods, 172-175 
least prejudiced distribution and learning, 
179-181 
meta rules, 172 
multiple rules, 176-177 
point semantic unification, 177—179 
rules, 170-172 
functional fuzzy c-means algorithm (FFCM), 
314-316, 381 
functions 
classical, 94 
differentiation of, 107—108 
fuzzy, 93-99 
integration of, 99-106 
fuzziness, 3—4, 49-52 
fuzzy and operator, 36-37 
fuzzy clustering. see clustering 
fuzzy c-means algorithm (FCM), 294, 299-300, 
320, 433 
fuzzy controllers, 226-228 
applications, 244—255 
decision parameters, 228 
defuzzification strategies, 232-239 
design parameters, 240—243 
fuzzy sets, 240-242 
Mamdani, 227, 228-231, 241 
scaling factors, 240 
self-organizing, 243 
self-tuning, 243 
Sugeno controller, 239-240 
types, 228—240 
fuzzy c-partition, 290 
fuzzy differentiation, 107—108 
fuzzy domains, 94 
fuzzy dynamic programming, 348-352 
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fuzzy functions, 93—95 
differentiation of, 107—108 
extrema of, 95—99 
integration of, 99-106 

fuzzy graphs, 83-85 

fuzzy intervals, 66 

fuzzy Kohonen network, 321 

fuzzy languages, 160-169 

fuzzy linear ordering, 88 


fuzzy linear programming (FLP), 336-348 
with crisp objection function, 342-348 
flexible manufacturing systems control 


application, 405 
logistics application, 398—401 
symmetric, 337—342 
fuzzy logic, 151—153, 194 


fuzzy logic control (FLC), 223-264. see also 


fuzzy controllers 
adaptive, 243-244 
automatic control, 225—226 
extensions, 262 
origin, 223-225 
purpose, 223-224 
rules, 242-243 
stability, 257-262 
tools, 255 
fuzzy maximum, 96-99 
fuzzy measures, 47-49 
fuzzy multicriteria analysis, 352-365 
fuzzy numbers, 59 
LR-type, 64—66 
positive, 59 
triangular, 59 
fuzzy order relation, 88 
fuzzy partial order relation, 88 
fuzzy preorder relation, 88 
fuzzy production rules, 190, 202 
fuzzy quantifiers, 200 
fuzzy relational databases, 266-268 
fuzzy relations, 71-83, 86-89 
compositions of, 76-79 
cylindrical extension, 75—76 
definition, 71, 73 
intersection, 73-74 
projections, 74-75 
properties, 79-83 
types, 86-89 
union, 73-74 
fuzzy restrictions, 123 
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Fuzzy Rule Base, 319 
fuzzy sets 
core, 233 
definition, 11—13 
design parameters, 240-242 
fuzzy functions on, 93—95 
LR-representation, 64—68 
operations for, 16—20, 27-44, 56-59 
possibility distributions and, 122-126 
support for, 14 
types, 23-27 
fuzzy set theory, 5-8 
advantages, 478 
applications, 371—442 
future perspectives, 477—479 
goals of, 6-8 
research in, 443-475 
fuzzy shell clustering (FSC) algorithm, 
299-300 
fuzzy singleton property, 237 
fuzzy subgraphs, 84 
fuzzy TECH, 255 


gearboxes, fault detection in, 381—389 

generalized modus ponens, 158-159 

good implication operators, 158 

grade of membership. see membership 
function 

graphs, fuzzy, 83-85 

graph-theoretic clustering method, 284 


Hamacher-operators, 32-34 
hierarchical clustering method, 281-282 
HMMS-model, 411, 412, 414, 416 
horizontal movement property, 237 


implication operators, 152, 157—159 

imprecision, 6 

increasing operations, 60—61 

indiscernibility, 27 

indistinguishability, 27 

inference engine, 191, 193 

input matching, 202 

instructor scheduling application, 419- 
426 

integrals, 100-101 
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integration of fuzzy functions, 99-106 

over crisp interval, 100-103 

over fuzzy interval, 103-106 

properties of integrals, 101—103 
intersection 

Dubois and Prade definition, 34—35 

of fuzzy relations, 73—74 

Hamacher definition, 33 

membership function of, 16 

t-norms, 29, 30 

type 2 fuzzy sets, 57-58 

Yager definition, 34 
intersection-min-operator, 465-468 
interval-information, 117 
interval semantic unification, 173 
intuitonistic fuzzy sets (IFS), 25-27 
inventory control applications, 426-431 
inventory planning applications, 411-417 


Jeffrey’s rule of inference, 173-174, 175 
job shop scheduling application, 401-404 


kiln control application, 249-255 
knowledge acquisition model, 188 
knowledge base, 188-189 

Kohonen feature map, 320 
Kolmogoroff’s probability, 3, 135-136 
Koopman’s probability, 3, 135-136 


languages, fuzzy, 160-169 
laws, 446 
least prejudiced distribution, 179-181 
left width, 240 
length of path, 85 
L-fuzzy set, 25 
linear programming. see fuzzy linear 
programming 
linguistic approximation, 154 
linguistic hedge, 147 
linguistic information, 117-118 
linguistic modeling 
description, 203—206 
evaluation, machine tools, 375—380 
linguistic state space, 258 
linguistic trajectory, 259 
linguistic truth tables, 153—155 


511 


linguistic variables, 141—149 
Boolean, 148 
definition, 142 
logic 
Boolean, 149—150, 156 
Epistemic, 151 
fuzzy, 151-153 
Lukasiewicz, 155 
Modal, 150-151 
predicate calculus, 151 
logical consistency, 449 
logistics applications, 393-401 
fuzzy linear programming, 398—401 
transportation, 393-398 
lower probability, 195, 196, 197 
LR-representation of fuzzy sets, 64— 
68 
Lukasiewicz logic, 155 


machine tools, linguistic evaluation and 
ranking, 375-380 
maintenance 
management application, 322—323 
scheduling application, 418-419 
Mamdani controller, 227, 228-231 
management applications, 374, 389- 
440 
inventory control, 426-431 
location, 390-393 
logistics, 393-401 
maintenance, 322-323 
marketing, 432-440 
scheduling, 401-426 
marketing applications, 432—440 
customer behavior, 433-440 
customer segmentation, banking and finance, 
432-433 
mass assignment theory, 173, 180 
matching, 201—203 
max-av composition, 76, 77—79, 83 
max-* composition, 76 
maximizing sets, 95-96, 345 
max-min composition 
definition, 76 
properties, 79-83 
max-prod composition, 76, 77—79, 83 
measure of belief, 197 
measure of plausibility, 197 
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measures of fuzziness, 49-52 
distance between fuzzy set and complement, 
51 
entropy as, 49-50 
Yager definition, 51 
membership functions, 12, 16 
complement, 17 
condition width, 242 
intersection, 16 
left width, 240 
modal/peak value, 240 
range, 12 
research on, 453-463 
right width, 240 
union, 17 
membership functions of objective function, 
346-347 
Modal logic, 150-151 
modal value, 240 
model car control application, 246-248 
models 
classification of, 448 
definition, 444—446, 447 
operations research, 447—449 
testing, 449-453 
theory relationship to, 446 
Type A-membership model, 454-456 
Type B-membership model, 456—463 
modifiers, 147 
modus ponens, 150, 156 
monotony property, 237 
mth power, 28 
u-length of path, 85 
Multi Attribute Decision Making (MADM), 
352, 359-365 
Baas and Kwakernaak model, 362-365 
stages, 359 
Yager model, 360-362 
multi criteria analysis, 232 
multicriteria analysis, 352-365 
multidimensional functions, 312-313 
multilayer perceptron, 319 
Multi Objective Decision Making (MODM), 
352, 353-359 


necessity measures, 128, 198 
negative fuzzy numbers, 59—60 
normalized rule bases, 255 
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not operator, 152 
numerical information, 117 


objective-function clustering methods, 284 
objects, 278 
operations, 27-44 

algebraic, 28—29, 59-68 

fuzzy logic, 149, 151 

implication, 152, 157-159 

logical, 152 

properties, 60—61 

research on, 463-474 

selection criteria for, 43-44 

set-theoretic, 16-20, 29-43 

for type 2 fuzzy sets, 56-59 
optimal compromise solutions, 353-354 
optimal decisions, 95 
optimal values of the objective function, 343 
Ordered Weighted Averaging (OWA) operators, 

39-43 

or operator, 152 
output matching, 202 


particularization, 162 
partition coefficient, 297 
partition entropy, 297-298 
paths, 85 
pattern recognition. see data analysis 
peak value, 240 
perfect fuzzy order relation, 88 
perfectly antisymmetric relation, 80-81 
petrochemical plants maintenance management, 
322-323 
plausibility measure, 197 
plausible reasoning, 156—160 
point semantic unification, 177—179 
pointwise similarity, 311—312 
positive fuzzy numbers, 59-60 
possibilistic c-means algorithm (PCM), 301 
possibility distribution, 122-123, 124, 132, 161 
possibility measures, 48, 126-128, 198 
possibility/probability consistency principle, 
126-127 
possibility theory, 122-129, 151 
fuzzy sets and distribution, 122—126 
necessity measures, 128 
possibility measures, 126—128 
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probability compared, 133-138 
qualification, 194, 198-199 
predicate calculus, 151 
prescriptive models, 448 
probabilistic set A, 25 
probabilistic sum, 28 
probability measures, 136 
probability of fuzzy events, 129-133 
definition, 130, 131 
as fuzzy set, 131-133 
possibility compared, 133-138 
qualification, 194-198 
as scalar, 129-131 
procedural knowledge, 188 
production planning and control (PPC), 405 
production rules, 190, 202 
properties 
binary operations, 60—61 
defuzzification strategies, 237—238 
extended operations, 62—63 
factual models, 449 
integrals of fuzzy functions, 101—103 
max-min composition, 79-83 
proportion exponent, 298 
PRUF (Possibilistic Relational Universal 
Fuzzy), 160-169 
translation rules, 163-169 
Type I rules, 164—165 
Type II rules, 165-166 
Type III rules, 167 
Type IV rules, 167-169 


quantification, 199-201 


reflexivity, 79-80 
relational assignment equation, 123 
relational databases, 266—268 
relative cardinality, 16 
relaxation, 7 
research, 443-475 
on aggregators, 463-474 
laws in, 446 
on membership functions, 453-463 
models in, 443-446, 447-453 
theories in, 446 
right width, 240 
rough sets, 27 
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scheduling applications, 401-426 


aggregate production and inventory planning, 


411-417 


courses, instructors and classrooms, 419—426 


flexible manufacturing systems (FMS) 
control, 405-411 

job shop scheduling, 401—404 
maintenance scheduling, 418-419 

second projection, 74 

semantic nets, 190 

semantic rule, 142 

set-theoretic operations, 16-20, 29-43 
averaging, 36—39 
Ordered Weight Averaging (OWA), 39-43 
t-conorms, 30-36 
t-norms, 30 

shape functions, 64 

similarity of functions, 307-313 
multi-dimensional functions, 312-313 
pointwise, 311—312 
structural, 307-311 

similarity relation, 87-88 

similarity trees, 87 

simple plant location model (SPLP), 390 

s-norms, 30-36 

SPERIL I, 210-213 

stochastic fuzzy model, 24 

stochastic uncertainty, 3 

strength of path, 85 

strong a-cut, 14 

strong ot-level set, 14 

strong vertical translation property, 238 

structural similarity, 307—311 

structured variables, 147 

subgraphs, fuzzy, 84 

Sugeno controller, 239-240 

support logic programming, 169 

support of fuzzy set, 14 

symbolic information, 118 

symmetry, 80-82, 241-242 


t-conorm property, 238 

t-conorms (triangular conorms), 29, 30-36 
terms, 142 

theories, 446 

theory of evidence, 195 

t-norm property, 238 

t-norms (triangular norms), 29, 30 
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total fuzzy order relation, 88 
total projection, 74 
transitivity, 82-83 
transportation application, 393-398 
trees, 85 
truth, 145—146, 149 
truth of proposition, 131 
truth qualification, 194 
truth tables, 149-150 
fuzzy, 154-155 
linguistic, 153-155 
truth values, 149, 151, 156 
Type A-membership model, 454—456 
Type B-membership model, 456—463 
type 2 fuzzy sets 
definition, 24 
operations for, 56-59 
type m fuzzy sets, 24 


uncertainty 
causes, 113, 114-116 
definition, 4—5, 114 
descriptions of, 113 
measures of, 112 
theories, 120 

uncertainty modeling, 6-7, 111-138 
application-oriented, 111—122 
expert systems, 193—203 
information available and, 117—118 
matching, 201-203 
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methods, 118-119 
possibility qualification, 194, 198-199 
possibility theory, 122-129, 133-138, 151 
probability qualification, 194-198 
probability theory, 129-138 
quantification, 199-201 
as transformer of information, 119—120 
uncertain phenomena and, 120-122 
union, 468-474 
Dubois and Prade definition, 35 
of fuzzy relations, 73—74 
Hamacher definition, 33-34 
membership function of, 17 
t-conorms, 29, 30-36 
type 2 fuzzy sets, 57 
Yager definition, 34 
unitary possibility distribution function, 161 
unit possibility distribution, 161 
university scheduling application, 419-426 
upper probability, 195, 196, 197 
usefulness, 449 


vagueness, 1-3 

variable of higher order, 141 
variables, linguistic, 122, 141-149 
variance criterion, 291—292 
vector-maximum problem, 353 


Yager-operators, 34 


